regenie

Web: https://github.com/rgcgithub/regenie (documentation)

It is the easiest to use the Centos 7 distribution,

export version=v2.2.4
wget -qO- https://github.com/rgcgithub/regenie/releases/download/${version}/regenie_${version}.gz_x86_64_Centos7_mkl.zip | \
gunzip -c > regenie_${version}
chmod +x regenie_${version}
ln -sf regenie_${version} regenie
regenie --help
 

The last command gives the following information (Why .gz in the banner?),

              |=============================|
              |      REGENIE v2.2.4.gz      |
              |=============================|

Copyright (c) 2020-2021 Joelle Mbatchou, Andrey Ziyatdinov and Jonathan Marchini.
Distributed under the MIT License.
Compiled with Boost Iostream library.
Using Intel MKL with Eigen.


Usage:
  regenie [OPTION...]

  -h, --help      print list of available options
      --helpFull  print list of all available options

 Main options:
      --step INT                specify if fitting null model (=1) or
                                association testing (=2)
      --bed PREFIX              prefix to PLINK .bed/.bim/.fam files
      --pgen PREFIX             prefix to PLINK2 .pgen/.pvar/.psam files
      --bgen FILE               BGEN file
      --sample FILE             sample file corresponding to BGEN file
      --ref-first               use the first allele as the reference for
                                BGEN or PLINK bed/bim/fam input format [default
                                assumes reference is last]
      --keep FILE               comma-separated list of files listing samples
                                to retain in the analysis (no header; starts
                                with FID IID)
      --remove FILE             comma-separated list of files listing samples
                                to remove from the analysis (no header;
                                starts with FID IID)
      --extract FILE            comma-separated list of files with IDs of
                                variants to retain in the analysis
      --exclude FILE            comma-separated list of files with IDs of
                                variants to remove from the analysis
  -p, --phenoFile FILE          phenotype file (header required starting with
                                FID IID)
      --phenoCol STRING         phenotype name in header (use for each
                                phenotype to keep; can use parameter expansion
                                {i:j})
      --phenoColList STRING,..,STRING
                                comma separated list of phenotype names to
                                keep (can use parameter expansion {i:j})
  -c, --covarFile FILE          covariate file (header required starting with
                                FID IID)
      --covarCol STRING         covariate name in header (use for each
                                covariate to keep; can use parameter expansion
                                {i:j})
      --covarColList STRING,..,STRING
                                comma separated list of covariate names to
                                keep (can use parameter expansion {i:j})
      --catCovarList STRING,..,STRING
                                comma separated list of categorical
                                covariates
  -o, --out PREFIX              prefix for output files
      --qt                      analyze phenotypes as quantitative
      --bt                      analyze phenotypes as binary
  -1, --cc12                    use control=1,case=2,missing=NA encoding for
                                binary traits
  -b, --bsize INT               size of genotype blocks
      --cv INT(=5)              number of cross validation (CV) folds
      --loocv                   use leave-one out cross validation (LOOCV)
      --l0 INT(=5)              number of ridge parameters to use when
                                fitting models within blocks [evenly spaced in
                                (0,1)]
      --l1 INT(=5)              number of ridge parameters to use when
                                fitting model across blocks [evenly spaced in (0,1)]
      --lowmem                  reduce memory usage by writing level 0
                                predictions to temporary files
      --lowmem-prefix PREFIX    prefix where to write the temporary files in
                                step 1 (default is to use prefix from --out)
      --split-l0 PREFIX,N       split level 0 across N jobs and set prefix of
                                output files
      --run-l0 FILE,K           run level 0 for job K in {1..N} using master
                                file created from '--split-l0'
      --run-l1 FILE             run level 1 using master file from
                                '--split-l0'
      --keep-l0                 avoid deleting the level 0 predictions
                                written on disk after fitting the level 1 models
      --strict                  remove all samples with missingness at any of
                                the traits
      --print-prs               also output polygenic predictions without
                                using LOCO (=whole genome PRS)
      --gz                      compress output files (gzip format)
      --apply-rint              apply Rank-Inverse Normal Transformation to
                                quantitative traits
      --threads INT             number of threads
      --pred FILE               file containing the list of predictions files
                                from step 1
      --ignore-pred             skip reading predictions from step 1
                                (equivalent to linear/logistic regression with only
                                covariates)
      --use-prs                 when using whole genome PRS step 1 output in
                                '--pred'
      --write-samples           write IDs of samples included for each trait
                                (only in step 2)
      --minMAC FLOAT(=5)        minimum minor allele count (MAC) for tested
                                variants
      --minINFO DOUBLE(=0)      minimum imputation info score (Impute/Mach
                                R^2) for tested variants
      --no-split                combine asssociation results into a single
                                for all traits
      --firth                   use Firth correction for p-values less than
                                threshold
      --approx                  use approximation to Firth correction for
                                computational speedup
      --spa                     use Saddlepoint approximation (SPA) for
                                p-values less than threshold
      --pThresh FLOAT(=0.05)    P-value threshold below which to apply
                                Firth/SPA correction
      --write-null-firth        store coefficients from null models with
                                approximate Firth for step 2
      --compute-all             store Firth estimates for all chromosomes
      --use-null-firth FILE     use stored coefficients for null model in
                                approximate Firth
      --chr STRING              specify chromosome to test in step 2 (use for
                                each chromosome)
      --chrList STRING,..,STRING
                                Comma separated list of chromosomes to test
                                in step 2
      --range CHR:MINPOS-MAXPOS
                                to specify a physical position window for
                                variants to test in step 2
      --sex-specific STRING     for sex-specific analyses (male/female)
      --af-cc                   print effect allele frequencies among
                                cases/controls for step 2
      --test STRING             'additive', 'dominant' or 'recessive'
                                (default is additive test)
      --set-list FILE           file with sets definition
      --extract-sets FILE       comma-separated list of files with IDs of
                                sets to retain in the analysis
      --exclude-sets FILE       comma-separated list of files with IDs of
                                sets to remove from the analysis
      --extract-setlist STRING  comma separated list of sets to retain in the
                                analysis
      --exclude-setlist STRING  comma separated list of sets to remove from
                                the analysis
      --anno-file FILE          file with variant annotations
      --anno-labels FILE        file with labels to annotations
      --mask-def FILE           file with mask definitions
      --aaf-file FILE           file with AAF to use when building masks
      --aaf-bins FLOAT,..,FLOAT
                                comma separated list of AAF bins cutoffs for
                                building masks
      --build-mask STRING       rule to construct masks, can be 'max', 'sum'
                                or 'comphet' (default is max)
      --singleton-carrier       define singletons as variants with a single
                                carrier in the sample
      --write-mask              write masks in PLINK bed/bim/fam format
      --mask-lovo STRING        apply Leave-One-Variant-Out (LOVO) scheme
                                when building masks
                                (<set_name>,<mask_name>,<aaf_cutoff>)
      --mask-lodo STRING        apply Leave-One-Domain-Out (LODO) scheme when
                                building masks
                                (<set_name>,<mask_name>,<aaf_cutoff>)
      --skip-test               skip computing association tests after
                                building masks
      --check-burden-files      check annotation file, set list file and mask
                                file for consistency
      --strict-check-burden     to exit early if the annotation, set list and
                                mask definition files don't agree

For more information, use option '--help' or visit the website: https://rgcgithub.github.io/regenie/

 

3.2.7

We could compile from source,

cd ~/rds/public_databases/software/
wget -qO- https://github.com/rgcgithub/regenie/archive/refs/tags/v3.2.7.tar.gz | \
tar xvfz -
cd regenie-3.2.7/
export BGEN_PATH=~/rds/public_databases/software/bgen
module load zlib/1.2.11
export ZLIB_LIBRARY=/usr/local/Cluster-Apps/zlib/1.2.11
module load gcc/6
module load cmake-3.19.7-gcc-5.4-5gbsejo
module load intel/mkl/mic/2018.4
mkdir build
cd build
cmake ..
make
 

where the bgen and zlib libraries are indicated; module gcc/6 and cmake-3.19.7 are also necessary to get around some other errors.

Note that module load intel/mkl/mic/2018.4 is optional but desirable, and now we have

              |============================|
              |        REGENIE v3.2.7      |
              |============================|

Copyright (c) 2020-2022 Joelle Mbatchou, Andrey Ziyatdinov and Jonathan Marchini.
Distributed under the MIT License.
Using Intel MKL with Eigen.


Usage:
  ./regenie-3.2.7 [OPTION...]

  -h, --help      print list of available options
      --helpFull  print list of all available options

 Main options:
      --step INT                specify if fitting null model (=1) or
                                association testing (=2)
      --bed PREFIX              prefix to PLINK .bed/.bim/.fam files
      --pgen PREFIX             prefix to PLINK2 .pgen/.pvar/.psam files
      --bgen FILE               BGEN file
      --sample FILE             sample file corresponding to BGEN file
      --ref-first               use the first allele as the reference for
                                BGEN or PLINK bed/bim/fam input format
                                [default assumes reference is last]
      --keep FILE               comma-separated list of files listing
                                samples to retain in the analysis (no
                                header; starts with FID IID)
      --remove FILE             comma-separated list of files listing
                                samples to remove from the analysis (no
                                header; starts with FID IID)
      --extract FILE            comma-separated list of files with IDs of
                                variants to retain in the analysis
      --exclude FILE            comma-separated list of files with IDs of
                                variants to remove from the analysis
  -p, --phenoFile FILE          phenotype file (header required starting
                                with FID IID)
      --phenoCol STRING         phenotype name in header (use for each
                                phenotype to keep; can use parameter
                                expansion {i:j})
      --phenoColList STRING,..,STRING
                                comma separated list of phenotype names to
                                keep (can use parameter expansion {i:j})
  -c, --covarFile FILE          covariate file (header required starting
                                with FID IID)
      --covarCol STRING         covariate name in header (use for each
                                covariate to keep; can use parameter
                                expansion {i:j})
      --covarColList STRING,..,STRING
                                comma separated list of covariate names to
                                keep (can use parameter expansion {i:j})
      --catCovarList STRING,..,STRING
                                comma separated list of categorical
                                covariates
  -o, --out PREFIX              prefix for output files
      --qt                      analyze phenotypes as quantitative
      --bt                      analyze phenotypes as binary
  -1, --cc12                    use control=1,case=2,missing=NA encoding
                                for binary traits
  -b, --bsize INT               size of genotype blocks
      --cv INT(=5)              number of cross validation (CV) folds
      --loocv                   use leave-one out cross validation (LOOCV)
      --l0 INT(=5)              number of ridge parameters to use when
                                fitting models within blocks [evenly spaced
                                in (0,1)]
      --l1 INT(=5)              number of ridge parameters to use when
                                fitting model across blocks [evenly spaced
                                in (0,1)]
      --lowmem                  reduce memory usage by writing level 0
                                predictions to temporary files
      --lowmem-prefix PREFIX    prefix where to write the temporary files
                                in step 1 (default is to use prefix from
                                --out)
      --split-l0 PREFIX,N       split level 0 across N jobs and set prefix
                                of output files
      --run-l0 FILE,K           run level 0 for job K in {1..N} using
                                master file created from '--split-l0'
      --run-l1 FILE             run level 1 using master file from
                                '--split-l0'
      --l1-phenoList STRING,...,STRING
                                run level 1 for a subset of the phenotypes
                                (specified as comma-separated list)
      --keep-l0                 avoid deleting the level 0 predictions
                                written on disk after fitting the level 1
                                models
      --strict                  remove all samples with missingness at any
                                of the traits
      --print-prs               also output polygenic predictions without
                                using LOCO (=whole genome PRS)
      --gz                      compress output files (gzip format)
      --apply-rint              apply Rank-Inverse Normal Transformation to
                                quantitative traits
      --apply-rerint            apply Rank-Inverse Normal Transformation to
                                residualized quantitative traits in step 2
      --apply-rerint-cov        apply Rank-Inverse Normal Transformation to
                                residualized quantitative traits and
                                project covariates out in step 2
      --threads INT             number of threads
      --pred FILE               file containing the list of predictions
                                files from step 1
      --ignore-pred             skip reading predictions from step 1
                                (equivalent to linear/logistic regression
                                with only covariates)
      --use-prs                 when using whole genome PRS step 1 output
                                in '--pred'
      --write-samples           write IDs of samples included for each
                                trait (only in step 2)
      --minMAC FLOAT(=5)        minimum minor allele count (MAC) for tested
                                variants
      --minINFO DOUBLE(=0)      minimum imputation info score (Impute/Mach
                                R^2) for tested variants
      --no-split                combine asssociation results into a single
                                for all traits
      --firth                   use Firth correction for p-values less than
                                threshold
      --approx                  use approximation to Firth correction for
                                computational speedup
      --spa                     use Saddlepoint approximation (SPA) for
                                p-values less than threshold
      --pThresh FLOAT(=0.05)    P-value threshold below which to apply
                                Firth/SPA correction
      --write-null-firth        store coefficients from null models with
                                approximate Firth for step 2
      --compute-all             store Firth estimates for all chromosomes
      --use-null-firth FILE     use stored coefficients for null model in
                                approximate Firth
      --chr STRING              specify chromosome to test in step 2 (use
                                for each chromosome)
      --chrList STRING,..,STRING
                                Comma separated list of chromosomes to test
                                in step 2
      --range CHR:MINPOS-MAXPOS
                                to specify a physical position window for
                                variants to test in step 2
      --sex-specific STRING     for sex-specific analyses (male/female)
      --af-cc                   print effect allele frequencies among
                                cases/controls for step 2
      --test STRING             'additive', 'dominant' or 'recessive'
                                (default is additive test)
      --condition-list FILE     file with list of variants to include as
                                covariates
      --condition-file FORMAT,FILE
                                optional genotype file which contains the
                                variants to include as covariates
      --condition-file-sample FILE
                                sample file accompanying BGEN file with the
                                conditional variants
      --interaction STRING      perform interaction testing with a
                                quantitative/categorical covariate
      --interaction-snp STRING  perform interaction testing with a variant
      --interaction-file FORMAT,FILE
                                optional genotype file which contains the
                                variant for GxG interaction test
      --interaction-file-sample FILE
                                sample file accompanying BGEN file with the
                                interacting variant
      --interaction-file-reffirst
                                use the first allele as the reference for
                                the BGEN or PLINK file with the interacting
                                variant [default assumes reference is last]
      --interaction-prs         perform interaction testing with the full
                                PRS from step 1
      --force-condtl            to also condition on interacting SNP in the
                                marginal GWAS test
      --no-condtl               to print out all main effects in GxE
                                interaction test
      --rare-mac FLOAT(=1000)   minor allele count (MAC) threshold below
                                which to use HLM for interaction testing
                                with QTs
      --set-list FILE           file with sets definition
      --extract-sets FILE       comma-separated list of files with IDs of
                                sets to retain in the analysis
      --exclude-sets FILE       comma-separated list of files with IDs of
                                sets to remove from the analysis
      --extract-setlist STRING  comma separated list of sets to retain in
                                the analysis
      --exclude-setlist STRING  comma separated list of sets to remove from
                                the analysis
      --anno-file FILE          file with variant annotations
      --anno-labels FILE        file with labels to annotations
      --mask-def FILE           file with mask definitions
      --aaf-file FILE           file with AAF to use when building masks
      --set-singletons          use 0/1 indicator in third column of AAF
                                file to specify singleton variants
      --aaf-bins FLOAT,..,FLOAT
                                comma separated list of AAF bins cutoffs
                                for building masks
      --build-mask STRING       rule to construct masks, can be 'max',
                                'sum' or 'comphet' (default is max)
      --vc-tests STRING,..,STRING
                                comma separated list of tests to compute
                                for each set of variants included in a mask
                                [skat/skato/skato-acat/acatv/acato]
      --vc-maxAAF FLOAT(=1)     maximum AAF for variants included in
                                gene-based tests
      --weights-col arg         column index (1-based) for user-defined
                                weights in annotation file
      --multiply-weights        multiply the user defined weights by the
                                default SKAT weights in SKAT/ACAT tests
      --joint STRING            comma spearated list of joint tests to
                                perform
      --singleton-carrier       define singletons as variants with a single
                                carrier in the sample
      --write-mask              write masks in PLINK bed/bim/fam format
      --mask-lovo STRING        apply Leave-One-Variant-Out (LOVO) scheme
                                when building masks
                                (<set_name>,<mask_name>,<aaf_cutoff>)
      --mask-lodo STRING        apply Leave-One-Domain-Out (LODO) scheme
                                when building masks
                                (<set_name>,<mask_name>,<aaf_cutoff>)
      --skip-test               skip computing association tests after
                                building masks
      --check-burden-files      check annotation file, set list file and
                                mask file for consistency
      --strict-check-burden     to exit early if the annotation, set list
                                and mask definition files don't agree
      --force-qt                force QT run for traits with few unique
                                values
      --par-region STRING(=hg38)
                                build code to identify PAR region
                                boundaries on chrX

For more information, use option '--help' or visit the website: https://rgcgithub.github.io/regenie/
 

Reference

Mbatchou, J., Barnard, L., Backman, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet 53, 1097–1103 (2021). https://doi.org/10.1038/s41588-021-00870-7