CookHLA
Web: https://github.com/WansonChoi/CookHLA
Installation
According to the GitHub page above, the following steps are required.
# 0. CookHLA
git clone https://github.com/WansonChoi/CookHLA.git
cd CookHLA
# 1. PLINK 1.9
module load plink-1.9-gcc-5.4.0-sm3ojoi
mkdir dependency
cd dependency
# 2. mach1
wget -qO- https://csg.sph.umich.edu/abecasis/MACH/download/mach.1.0.18.Linux.tgz | \
# 3. beagle 5.1
wget https://faculty.washington.edu/browning/beagle/beagle.18May20.d20.jar -O beagle.jar
tar xvfz -
# 4. beagle utilities
wget https://faculty.washington.edu/browning/beagle_utilities/beagle2linkage.jar
wget https://faculty.washington.edu/browning/beagle_utilities/beagle2vcf.jar
wget https://faculty.washington.edu/browning/beagle_utilities/linkage2beagle.jar
wget https://faculty.washington.edu/browning/beagle_utilities/vcf2beagle.jar
wget https://faculty.washington.edu/browning/beagle_utilities/transpose.jar
# 6. Python
source ~/COVID-19/py37/bin/activate
pip install pyliftover==0.4
where ~/COVID-19/py37 is our Python virtual environment under Python version 3.7, and pyliftover is the Python liftover package.
Nevertheless, the dependency directory already contains plink, beagle5.jar (can be renamed into beagle.jar), mach1 and all BEAGLE utilities.
Example
The example mirrors those in SNP2HLA1,
# Adaptive Genetic Map
python -m MakeGeneticMap \
-i example/1958BC.hg19 \
-hg 19 \
-ref 1000G_REF/1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC \
-o work/1958BC+1000G_REF.EUR
# Imputation
python CookHLA.py \
-i example/1958BC.hg19 \
-hg 19 \
-o work/1958BC+HM_CEU_REF \
-ref example/HM_CEU_REF \
-gm example/AGM.1958BC+HM_CEU_REF.mach_step.avg.clpsB \
-ae example/AGM.1958BC+HM_CEU_REF.aver.erate \
-mem 20g \
-mp 8
The imputation gives 1958BC+HM_CEU_REF.MHC.HLA_IMPUTATION_OUT .alleles and .hped files, which is handled with HATK2 and HLA-TAPAS3. The software also takes output from HIBAG4, among others.
1000Genomes
We are rather tempted to use these from CookHLA with SNP2HLA, and follow the footnote on SNP2HLA we have,
csh SNP2HLA.csh 1958BC 1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC 1958BC_IMPUTED_1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC plink
plink --noweb --dosage 1958BC_IMPUTED_1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC.dosage noheader format=1 \
--fam 1958BC_IMPUTED_1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC.fam \
--linear --out 1958BC_IMPUTED_1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC.dosage.assoc
we obtain the screen output,
SNP2HLA: Performing HLA imputation for dataset 1958BC
- Java memory = 2000Mb
- Beagle window size = 1000 markers
[1] Extracting SNPs from the MHC.
[2] Performing SNP quality control.
[3] Convering data to beagle format.
[4] Performing HLA imputation (see 1958BC_IMPUTED_1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC.bgl.log for progress).
[5] Converting posterior probabilities to PLINK dosage format.
[6] Converting imputation genotypes to PLINK .ped format.
DONE!
PLINK v1.90p 64-bit (8 Nov 2021) www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to 1958BC_IMPUTED_1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC.dosage.assoc.log.
Options in effect:
--dosage 1958BC_IMPUTED_1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC.dosage noheader format=1
--fam 1958BC_IMPUTED_1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC.fam
--linear
--noweb
--out 1958BC_IMPUTED_1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC.dosage.assoc
Note: --dosage automatically performs a regression; --linear/--logistic has no
additional effect.
Note: --noweb has no effect since no web check is implemented yet.
257130 MB RAM detected; reserving 128565 MB for main workspace.
10 people (9 males, 1 female) loaded from .fam.
10 phenotype values loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
10 people pass filters and QC.
Among remaining phenotypes, 0 are cases and 10 are controls.
--dosage: Reading from
1958BC_IMPUTED_1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC.dosage.
--dosage: Results saved to
1958BC_IMPUTED_1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC.dosage.assoc.assoc.dosage
References
Choi W, Luo Y, Raychaudhuri S & Han B HATK: HLA analysis toolkit. Bioinformatics 37, 416-418 (2020).
Cook S, et al. Accurate imputation of human leukocyte antigens with CookHLA. Nat Comm 12, 1264 (2021).
Degenhardt F, et al. Construction and benchmarking of a multi-ethnic reference panel for the imputation of HLA class I and II alleles. Hum Mol Genet 28, 2078-2092 (2018).
Jia X, et al. Imputing Amino Acid Polymorphisms in Human Leukocyte Antigens. PLOS ONE 8, e64683 (2013).
Immuno Polymorphism Database-international ImMunoGeneTics project (IMGT) (IPD-IMGT/HLA), https://www.ebi.ac.uk/ipd/imgt/hla/
Luo, Y, et al. A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response. Nat Genet 53, 1504–1516 (2021), https://doi.org/10.1038/s41588-021-00935-7.
WHO Committe. Nomenclature for Factors of the HLA System, http://hla.alleles.org/.
Zheng X, et al. HIBAG—HLA genotype imputation with attribute bagging. The Pharmacogenomics J 14, 192-200 (2014), HLARES.
-
SNP2HLA
Web: SNP2HLA v1.0.3 (BEAGLE utitlities)
Installation
It turns out to be difficult to download beagle 3.0.4 as indicated so it is included in the files/ directory (beagle 3.0.4, documentation, example).
wget -qO- https://software.broadinstitute.org/mpg/snp2hla/data/SNP2HLA_package_v1.0.3.tar.gz | \ tar xvfz - cd SNP2HLA_package_v1.0.3/SNP2HLA # add beagle2linkage.jar as above # test.sh is adapted from SNP2HLA.csh by removing argument checking and as an executable. # The .dos file described in README.txt is actually the .dosage file generated internally from test.sh # The association statistics will be in .assoc.assoc.dosage; the double .assoc guarantees an .assoc.log file. csh SNP2HLA.csh 1958BC HM_CEU_REF 1958BC_IMPUTED plink csh ParseDosage.csh 1958BC_IMPUTED.bgl.gprobs > 1958BC_IMPUTED.bgl.dos plink --noweb --dosage 1958BC_IMPUTED.dosage noheader format=1 --fam 1958BC_IMPUTED.fam --logistic --out 1958BC_IMPUTED.assoc
As indicated, we call the .csh scripts with
csh
which is actuallytsch
on csd3.Example
The screen output is as follows,
SNP2HLA: Performing HLA imputation for dataset 1958BC - Java memory = 2000Mb - Beagle window size = 1000 markers [1] Extracting SNPs from the MHC. [2] Performing SNP quality control. [3] Convering data to beagle format. [4] Performing HLA imputation (see 1958BC_IMPUTED.bgl.log for progress). [5] Converting posterior probabilities to PLINK dosage format. [6] Converting imputation genotypes to PLINK .ped format. DONE!
The output is 1958BC_IMPUTED in PLINK binary format.
MakeReference
We first create
MakeReference.tcsh
such that calls to .pl scripts are prefixed withperl
, so thatdiff MakeReference.csh MakeReference.tcsh
77c77 < ./HLAtoSequences.pl $HLA_DATA HLA_DICTIONARY_AA.txt AA > $OUTPUT.AA.ped --- > perl HLAtoSequences.pl $HLA_DATA HLA_DICTIONARY_AA.txt AA > $OUTPUT.AA.ped 81c81 < ./encodeVariants.pl $OUTPUT.AA.ped $OUTPUT.AA.map $OUTPUT.AA.CODED --- > perl encodeVariants.pl $OUTPUT.AA.ped $OUTPUT.AA.map $OUTPUT.AA.CODED 93c93 < ./encodeHLA.pl $HLA_DATA $OUTPUT.HLA.map > $OUTPUT.HLA.ped --- > perl encodeHLA.pl $HLA_DATA $OUTPUT.HLA.map > $OUTPUT.HLA.ped 100c100 < ./HLAtoSequences.pl $HLA_DATA HLA_DICTIONARY_SNPS.txt SNPS > $OUTPUT.SNPS.ped --- > perl HLAtoSequences.pl $HLA_DATA HLA_DICTIONARY_SNPS.txt SNPS > $OUTPUT.SNPS.ped 104c104 < ./encodeVariants.pl $OUTPUT.SNPS.ped $OUTPUT.SNPS.map $OUTPUT.SNPS.CODED ---
tcsh MakeReference.tcsh HAPMAP_CEU HAPMAP_CEU_HLA.ped HM_CEU_REF plink
tcsh MakeReference.tcsh HAPMAP_CEU HAPMAP_CEU_HLA.ped HM_CEU_REF plink Creating reference panel: HM_CEU_REF [1] Generating amino acid sequences from HLA types. [2] Encoding amino acids positions. [3] Encoding HLA alleles. [4] Generating DNA sequences from HLA types. [5] Encoding SNP positions. [6] Extracting founders. [7] Merging SNP, HLA, and amino acid datasets. Warning: Variants 'SNP_A_30018350' and 'AA_A_-11_30018350' have the same position. Warning: Variants 'SNP_A_30018461_G' and 'SNP_A_30018461_A' have the same position. Warning: Variants 'SNP_A_30018461_T' and 'SNP_A_30018461_G' have the same position. 987 more same-position warnings: see log file. [8] Performing quality control. [9] Preparing files for Beagle. [10] Converting to beagle format. [11] Phasing reference using Beagle (see progress in HM_CEU_REF.bgl.log). [12] Done.
-
HATK
Web: https://github.com/WansonChoi/HATK
Installation
git clone https://github.com/WansonChoi/HATK
Example
source ~/COVID-19/py37/bin/activate python3 HATK.py \ --variants example/wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18 \ --hped example/wtccc_filtered_58C_RA.hatk.300+300.hped \ --2field \ --pheno example/wtccc_filtered_58C_RA.hatk.300+300.phe \ --pheno-name RA \ --out work/RESULT_EXAMPLE_wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18 \ --imgt 3320 \ --hg 18 \ --imgt-dir example/IMGTHLA3320 \ --multiprocess 2
This is from the documentation, where
--variants
reads in the genotype files and--hped
the .hped file to be followed by specification of the RA phenotype in a logistic regression. Note that the example is more desirable compared to the toy data in SNP2HLA given its 600 samples and 29,373 variants; we proceed with the imputation with the 1000Genomes panel provided with CookHLA.export hatk=${HPC_WORK}/HATK export cookhla=${HPC_WORK}/CookHLA cd ${cookhla} python CookHLA.py \ -i ${hatk}/example/wtccc_filtered_58C_RA.hatk.300+300.chr6.hg18 \ -hg 18 \ -o ${hatk}/work/hla_CookHLA \ -ref ${cookhla}/1000G_REF/1000G_REF.EUR.chr6.hg18.29mb-34mb.inT1DGC \ -gm ${caprion}/hla_IMPUTED.mach_step.avg.clpsB \ -ae ${caprion}/hla_IMPUTED.aver.erate \ -mem 20g \ -mp 8 cd -
-
HLA-TAPAS
Web: https://github.com/immunogenomics/HLA-TAPAS
Installation
git clone https://github.com/immunogenomics/HLA-TAPAS
Example
python HLA-TAPAS.py \ --target example/Case+Control.300+300.chr6.hg18 \ --reference example/1000G.EUR.chr6.hg18.28mb-35mb \ --hped-Ggroup example/1000G.EUR.Ggroup.hped \ --pheno example/Case+Control.300+300.phe \ --hg 18 \ --out MyHLA-TAPAS/Case+Control+1000G_EUR_REF \ --mem 4g \ --nthreads 4
-
HIBAG
Web: https://hibag.s3.amazonaws.com/index.html (Bioconductor)
Installation
Rscript -e 'BiocManager:install("HIBAG")'
Examples