ldsc
Web: https://github.com/bulik/ldsc (Google group).
CSD3 module
The module is loaded with
module load ceuadmin/ldsc
and one can run python ldsc.py
smoothly as documented.
Installation
We proceed as follows for installation into HPC_WORK=/rds/user/$USER/hpc-work,
module load python/2.7
virtualenv py27
source py27/bin/activate
cd ${HPC_WORK}
git clone https://github.com/bulik/ldsc
cd ldsc
pip install -r requirements.txt
wget https://data.broadinstitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz2 | \
bzip2 -d
The last two commands also gets the HapMap 3 SNP list. It is worthwhile to note that
- The ldsc documentation suggests Anaconda and on CSD3 we could use the miniconda/2 module, i.e, https://cambridge-ceu.github.io/csd3/systems/software.html#python, but it is considerably more involved.
- We only need
module load python/2.7
andsource ${HOME}/py27/bin/activate
later on.
Testing
We use the GIANT BMI data,
wget -qO- http://portals.broadinstitute.org/collaboration/giant/images/c/c8/Meta-analysis_Locke_et_al%2BUKBiobank_2018_UPDATED.txt.gz | \
gunzip -c > BMI.txt
python munge_sumstats.py --sumstats BMI.txt --a1 Tested_Allele --a2 Other_Allele --merge-alleles w_hm3.snplist --out BMI --a1-inc
Note the munging procedure requests large resources and will be terminated by CSD3, so we better test with a SLURM job instead.
Analysis
Heritability partition
We now complete the download on frequencies, baseline model LD scores, and regression weights and furnish this.
wget -qO- https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/1000G_Phase1_frq.tgz | tar xvfz -
wget -qO- https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/1000G_Phase1_baseline_ldscores.tgz | tar xvfz -
wget -qO- https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/weights_hm3_no_hla.tgz | tar xvfz -
Our batch file is as follows,
#!/usr/bin/bash
#SBATCH --job-name=BMI
#SBATCH --account CARDIO-SL0-CPU
#SBATCH --partition cardio
#SBATCH --qos=cardio
#SBATCH --mem=28800
#SBATCH --time=5-00:00:00
#SBATCH --output=/rds/user/jhz22/hpc-work/work/_BMI_%A_%a.out
#SBATCH --error=/rds/user/jhz22/hpc-work/work/_BMI_%A_%a.err
#SBATCH --export ALL
export TMPDIR=/rds/user/$USER/hpc-work/work
module load python/2.7
source ${HOME}/py27/bin/activate
cd ${HPC_WORK}/ldsc
python munge_sumstats.py --sumstats BMI.txt --a1 Tested_Allele --a2 Other_Allele --merge-alleles w_hm3.snplist --out BMI --a1-inc
python ldsc.py\
--h2 BMI.sumstats.gz\
--ref-ld-chr baseline/baseline.\
--w-ld-chr weights_hm3_no_hla/weights.\
--overlap-annot\
--frqfile-chr 1000G_frq/1000G.mac5eur.\
--out BMI_baseline
and our results are contained in the tab-delimited file named BMI_baseline.result
– note in particular the CNS enrichment P=8.30e-24.
Cell type analysis
We carry on with the download,
wget -qO- https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/1000G_Phase1_cell_type_groups.tgz | \
tar xvfz -
and for CNS, we have
python ldsc.py\
--h2 BMI.sumstats.gz\
--w-ld-chr weights_hm3_no_hla/weights.\
--ref-ld-chr cell_type_groups/CNS.,baseline/baseline.\
--overlap-annot\
--frqfile-chr 1000G_frq/1000G.mac5eur.\
--out BMI_CNS\
--print-coefficients
and our results are now contained in the tab-delimited file named BMI_CNS.results
.
We next use the --h2-cts
option with the Cahoy data analysis in a nutshell,
export cts_name=Cahoy
wget -qO- https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/LDSC_SEG_ldscores/${cts_name}_1000Gv3_ldscores.tgz | \
tar xfvz -
wget -qO- https://storage.googleapis.com/broad-alkesgroup-public/LDSCORE/1000G_Phase3_baseline_ldscores.tgz | \
tar xvfz -
ldsc.py\
--h2-cts BMI.sumstats.gz \
--ref-ld-chr 1000G_EUR_Phase3_baseline/baseline. \
--out BMI_${cts_name} \
--ref-ld-chr-cts $cts_name.ldcts \
--w-ld-chr weights_hm3_no_hla/weights.
The output BMI_Cahoy.cell_type_results.txt
is sufficiently small to include here,
Name | Coefficient | Coefficient_std_error | Coefficient_P_value |
---|---|---|---|
Neuron | 4.4874060288359995e-09 | 2.48025909733557e-09 | 0.035206172355899706 |
Oligodendrocyte | 8.067689953393081e-10 | 2.569340962599481e-09 | 0.376761120478732 |
Astrocyte | -4.036699628095808e-09 | 2.0886996416620756e-09 | 0.9733595763245972 |
In line with the finding above, we have a P=0.035 for neurons.
Genetic correlation
We carry on to calculate the genetic correlation (rg) between BMI and height. First, we obtain the LD scores,
wget -qO- https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.bz2 | \
tar xjf -
followed by downloading and munging height GWAS summary statistics
wget -qO- http://portals.broadinstitute.org/collaboration/giant/images/0/01/GIANT_HEIGHT_Wood_et_al_2014_publicrelease_HapMapCeuFreq.txt.gz | \
gunzip -c > height.txt
python munge_sumstats.py --sumstats height.txt \
--snp MarkerName --a1 Allele1 --a2 Allele2 --merge-alleles w_hm3.snplist --p p --out height --a1-inc
but again it will be killed and we need a SLURM job as above. On CSD3, it took a staggering 14hr.
Our analysis then proceeds with
python ldsc.py \
--rg BMI.sumstats.gz,height.sumstats.gz \
--ref-ld-chr eur_w_ld_chr/ \
--w-ld-chr eur_w_ld_chr/ \
--out BMI_height
It took just under 16s and BMI_height.log
contains the desired output quoted below
*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.1
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call:
./ldsc.py \
--ref-ld-chr eur_w_ld_chr/ \
--out BMI_height \
--rg BMI.sumstats.gz,height.sumstats.gz \
--w-ld-chr eur_w_ld_chr/
Beginning analysis at Wed Aug 4 07:11:02 2021
Reading summary statistics from BMI.sumstats.gz ...
Read summary statistics for 1019865 SNPs.
Reading reference panel LD Score from eur_w_ld_chr/[1-22] ... (ldscore_fromlist)
Read reference panel LD Scores for 1290028 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from eur_w_ld_chr/[1-22] ... (ldscore_fromlist)
Read regression weight LD Scores for 1290028 SNPs.
After merging with reference panel LD, 1014995 SNPs remain.
After merging with regression SNP LD, 1014995 SNPs remain.
Computing rg for phenotype 2/2
Reading summary statistics from height.sumstats.gz ...
Read summary statistics for 1217311 SNPs.
After merging with summary statistics, 1014995 SNPs remain.
993172 SNPs with valid alleles.
Heritability of phenotype 1
---------------------------
Total Observed scale h2: 0.2104 (0.0066)
Lambda GC: 2.7872
Mean Chi^2: 3.9573
Intercept: 1.0292 (0.0298)
Ratio: 0.0099 (0.0101)
Heritability of phenotype 2/2
-----------------------------
Total Observed scale h2: 0.342 (0.0176)
Lambda GC: 2.0007
Mean Chi^2: 2.9726
Intercept: 1.2239 (0.033)
Ratio: 0.1135 (0.0167)
Genetic Covariance
------------------
Total Observed scale gencov: 0.1517 (0.0047)
Mean z1*z2: 2.0178
Intercept: 0.738 (0.0148)
Genetic Correlation
-------------------
Genetic Correlation: 0.5656 (0.0081)
Z-score: 70.1339
P: 0.
Summary of Genetic Correlation Results
p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se
BMI.sumstats.gz height.sumstats.gz 0.5656 0.0081 70.1339 0.0 0.342 0.0176 1.2239 0.033 0.738 0.0148
Analysis finished at Wed Aug 4 07:11:18 2021
Total time elapsed: 15.56s
and rg=0.57.