hap.py

Web: https://github.com/Illumina/hap.py

wget -qO- https://github.com/Illumina/hap.py/archive/refs/tags/v0.3.15.tar.gz | tar xvfz -
cd hap.py-0.3.15/
module load gcc/6
source ~/rds/software/py2.7/bin/activate
export HG19=~/rds/public_databases/lib64/giab_lsk114_2022.12/analysis/hg002_truvari_svs/human_g1k_v37.fasta
python install.py $CEUADMIN/hap.py/0.3.15 --with-rtgtools

Now we set to replicate ONT benchmarks,

module load ceuadmin/awscli
aws s3 ls --no-sign-request s3://ont-open-data/giab_lsk114_2022.12/
aws s3 sync --no-sign-request s3://ont-open-data/giab_lsk114_2022.12/ giab_lsk114_2022.12
cd giab_lsk114_2022.12/benchmarking
mkdir truthset && cd truthset
wget ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/NISTv4.2.1/GRCh38/*
cd ../..

which contains analysis/run-happy.sh, when modified as happy.sb and after running gives

Hap.py
[W] overlapping records at chr6:29747433 for sample 0
[W] Variants that overlap on the reference allele: 5
[I] Total VCF records:         4048342
[I] Non-reference VCF records: 4048342
[W] overlapping records at chr1:151482 for sample 0
[W] Variants that overlap on the reference allele: 5359
[I] Total VCF records:         6084932
[I] Non-reference VCF records: 6084932
...

as in happy.e as well as output happy.o

Formatting FASTA data
Processing "GCA_000001405.15_GRCh38_no_alt_analysis_set.fna"

Detected: 'Human GRCh38 with UCSC naming', installing reference.txt


Input Data
Files              : GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
Format             : FASTA
Type               : DNA
Number of sequences: 195
Total residues     : 3099922541
Minimum length     : 970
Mean length        : 15897038
Maximum length     : 248956422

Output Data
SDF-ID             : b881adee-81c0-4dc3-b5b3-300ef6ca0b0b
Number of sequences: 195
Total residues     : 3099922541
Minimum length     : 970
Mean length        : 15897038
Maximum length     : 248956422
Hap.py
Benchmarking Summary:
  Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
 INDEL    ALL       525469    430462     95007       776455     55575     279008  14126  20494       0.819196          0.888280        0.359336         0.852340                     NaN                     NaN                   1.528276                   1.767044
 INDEL   PASS       525469    430462     95007       776455     55575     279008  14126  20494       0.819196          0.888280        0.359336         0.852340                     NaN                     NaN                   1.528276                   1.767044
   SNP    ALL      3365127   3361205      3922      4783865      7749    1413355   1385   3124       0.998835          0.997701        0.295442         0.998267                2.100128                1.668217                   1.581196                   1.671398
   SNP   PASS      3365127   3361205      3922      4783865      7749    1413355   1385   3124       0.998835          0.997701        0.295442         0.998267                2.100128                1.668217                   1.581196                   1.671398

NB only HG002 is tested here.