B. Notes on emsembl-vep
Location & module
The location of VEP is here on CSD3,
~/rds/rds-jmmh2-public_databases/ensembl-vep
The CSD3 module is named ceuadmin/ensembl-vep/104
, /usr/local/Cluster-Apps/ceuadmin/ensembl-vep/104
is a symbolic link to the CSD3 location above.
On icelake, we use module ceuadmin/ensembl-vep/111-icelake
, e.g.,
module load ceuadmin/ensembl-vep/111-icelake
vep --help
to get
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#
Versions:
ensembl : 111.a6cc543
ensembl-funcgen : 111.5327cdd
ensembl-io : 111.dbba8d6
ensembl-variation : 111.d616b1e
ensembl-vep : 111.0
Help: dev@ensembl.org , helpdesk@ensembl.org
Twitter: @ensembl
http://www.ensembl.org/info/docs/tools/vep/script/index.html
Usage:
./vep [--cache|--offline|--database] [arguments]
Basic options
=============
--help Display this message and quit
-i | --input_file Input file
-o | --output_file Output file
--force_overwrite Force overwriting of output file
--species [species] Species to use [default: "human"]
--everything Shortcut switch to turn on commonly used options. See web
documentation for details [default: off]
--fork [num_forks] Use forking to improve script runtime
For full option documentation see:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html
Features
- VEP version 104
- GRCh38 assembly
- homo_sapiends/homo_sapiends_merged species
- kent-335_base/ as required by setup for perl5/ Bio::DB::BigFile below
- Perl modules as in perl5/
- Plugins
Plugins
The version check under icelake
is furnised with
perl/5.26.3_system/gcc-8.4.1-4cl2czq
perl INSTALL.pl -a p -g list
— clinvar —
The compressed VCF and index files for GRCh38 are clinvar.vcf.gz
and clinvar.vcf.gz.tbi
, with counterparts for GRCh37 are clinvar_GRCh37.vcf.gz
and clinvar_GRCh37.vcf.gz.tbi
, respectively.
— loftee —
This is in line with the VEP installation which only includes GRCh38 reference files.
GRCh38
#!/usr/bin/bash
export ENSEMBL=~/rds/rds-jmmh2-public_databases/ensembl-vep
export PERL5LIB=${ENSEMBL}/Bio:${ENSEMBL}/perl5/lib/perl5:${ENSEMBL}/loftee:$HPC_WORK/bin
export rds=.. # ~/rds/rds-jmmh2-public_databases/ensembl-vep will be user-specific
export outdir=..
export LOFTEE38=${ENSEMBL}/loftee/loftee_data/GRCh38
export LOFTEE38GERP=${LOFTEE38}/gerp_conservation_scores.homo_sapiens.GRCh38.bw
export LOFTEE38HA=${LOFTEE38}/human_ancestor.fa.gz
export LOFTEE38SQL=${LOFTEE38}/loftee.sql
if [ ! -f Homo_sapiens.GRCh38.dna.toplevel.fa ]; then
ln -sf ${rds}/.vep/homo_sapiens_merged/104_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa
fi
${rds}/vep --input_file VEP_input.txt \
--format ensembl \
--output_file ${outdir}/VEP_output.txt \
--force_overwrite \
--offline \
--symbol \
--merged \
--fasta Homo_sapiens.GRCh38.dna.toplevel.fa \
--dir_cache ${rds}/.vep \
--dir_plugins . \
--protein \
--symbol \
--tsl \
--canonical \
--mane_select \
--biotype \
--check_existing \
--sift b \
--polyphen b \
--plugin LoF,loftee_path:.,gerp_bigwig:${LOFTEE38GERP},human_ancestor_fa:${LOFTEE38HA},conservation_file:${LOFTEE38SQL}
GRCh37
This mirrors GRCh37 based on 98.3 from /rds/rds-jmmh2-public_databases/software/ensembl-vep; see also loftee-grch37/test.sh.
#!/usr/bin/bash
export ENSEMBL=~/rds/rds-jmmh2-public_databases/ensembl-vep
export PERL5LIB=${ENSEMBL}/Bio:${ENSEMBL}/perl5/lib/perl5:${ENSEMBL}/loftee-grch37:$HPC_WORK/bin
export rds=.. # ~/rds/rds-jmmh2-public_databases/ensembl-vep will be user-specific
export LOFTEE37=${ENSEMBL}/loftee-grch37
export LOFTEE37GERP=${LOFTEE37}/GERP_scores.final.sorted.txt.gz
export LOFTEE37HA=${LOFTEE37}/human_ancestor.fa.rz
export LOFTEE37SQL=${LOFTEE37}/phylocsf_gerp.sql
${rds}/vep --input_file VEP_input.txt \
--format ensembl \
--output_file VEP_output_GRCh37.txt \
--force_overwrite \
--offline \
--symbol \
--dir_cache ${rds}/.vep \
--dir_plugins . \
--use_given_ref \
--check_existing \
--protein \
--symbol \
--tsl \
--canonical \
--mane_select \
--biotype \
--sift b \
--polyphen b \
--plugin LoF,loftee_path:.,human_ancestor_fa:${LOFTEE37HA},conservation_file:${LOFTEE37SQL}
— GeneSplicer —
This is a self-contained plugin.
— REVEL —
The REVEL/ directory contains reference files for REVEL score.