snakemake

GitHub (documentation, (stable documentation))

It is a python-based workflow management system.

:star: https://github.com/troycomi/snakemake-training

9.9.0-miniforge3

(21/8/2025)

This turns to be ideal for the benchmark of MitoImpute.

module load ceuadmin/miniforge3
export PREFIX=/usr/local/Cluster-Apps/ceuadmin/snakemake/9.9.0-miniforge3
conda create --prefix=$PREFIX snakemake mamba fastqc
conda list | grep -e snakemake -e mamba -e fastqc

We see similar output as before.

# packages in environment at /usr/local/Cluster-Apps/ceuadmin/snakemake/9.9.0-miniforge3:
fastqc                                  0.12.1           hdfd78af_0            bioconda
libmamba                                2.3.1            hae34dd5_1            conda-forge
mamba                                   2.3.1            hf857f84_1            conda-forge
snakemake                               9.9.0            hdfd78af_0            bioconda
snakemake-interface-common              1.21.0           pyhdfd78af_0          bioconda
snakemake-interface-executor-plugins    9.3.9            pyhdfd78af_0          bioconda
snakemake-interface-logger-plugins      1.2.4            pyhdfd78af_0          bioconda
snakemake-interface-report-plugins      1.2.0            pyhdfd78af_0          bioconda
snakemake-interface-storage-plugins     4.2.2            pyhdfd78af_0          bioconda
snakemake-minimal                       9.9.0            pyhdfd78af_0          bioconda

9.9.0

Owing to many issues with Miniconda3 currently, we resort to Anaconda3 by the following steps,

#!/bin/bash
set -e

# 1. Load the system-wide Anaconda module
module load ceuadmin/Anaconda3/2024.10-1

# 2. Initialize conda for shell
export CONDA_PREFIX=$CEUADMIN/Anaconda3/2024.10-1
source "${CONDA_PREFIX}/etc/profile.d/conda.sh"
conda activate base

# 3. Configure channels with strict priority
conda config --env --add channels defaults
conda config --env --add channels conda-forge
conda config --env --add channels bioconda
conda config --env --set channel_priority strict

# 4. Create isolated snakemake env with mamba included
PREFIX="/usr/local/Cluster-Apps/ceuadmin/snakemake/9.9.0"
conda create --yes --prefix "$PREFIX" -c conda-forge -c bioconda snakemake mamba fastqc

# 5. Verify version to confirm successful installation
source "${CONDA_PREFIX}/etc/profile.d/conda.sh"
conda activate "$PREFIX"
conda list | grep -e snakemake -e mamba -e fastqc

We see that

#
# To activate this environment, use
#
#     $ conda activate /usr/local/Cluster-Apps/ceuadmin/snakemake/5.26.1
#
# To deactivate an active environment, use
#
#     $ conda deactivate
# packages in environment at /usr/local/Cluster-Apps/ceuadmin/snakemake/9.9.0:
fastqc                                  0.12.1           hdfd78af_0            bioconda
libmamba                                2.3.1            hae34dd5_1            conda-forge
mamba                                   2.3.1            hf857f84_1            conda-forge
snakemake                               9.9.0            hdfd78af_0            bioconda
snakemake-interface-common              1.21.0           pyhdfd78af_0          bioconda
snakemake-interface-executor-plugins    9.3.9            pyhdfd78af_0          bioconda
snakemake-interface-logger-plugins      1.2.4            pyhdfd78af_0          bioconda
snakemake-interface-report-plugins      1.2.0            pyhdfd78af_0          bioconda
snakemake-interface-storage-plugins     4.2.2            pyhdfd78af_0          bioconda
snakemake-minimal                       9.9.0            pyhdfd78af_0          bioconda

7.19.1

Installation

We illustrate installation through fastqc and mamba at designated location.

module load miniconda3/4.5.1
export mypath=${HOME}/COVID-19/miniconda37
conda create --prefix ${mypath} python=3.7 ipykernel
conda init bash
source ~/.bashrc
source activate ${mypath}
conda install -c conda-forge mamba
mamba install -c bioconda snakemake-minimal
conda install -c bioconda snakemake
conda install -c bioconda fastqc
snakemake --help
conda deactivate

By default, the installation path is ${HOME}/.conda/envs/miniconda37.

After installation, the call later on will be simpler,

module load miniconda3/4.5.1
export mypath=${HOME}/COVID-19/miniconda37
source activate ${mypath}

CSD3 module

This is available with

module load ceuadmin/snakemake/7.19.1
snakemake --help

Alternatively,

module load miniconda3/4.5.1
source activate /usr/local/Cluster-Apps/ceuadmin/snakemake/7.19.1
snakemake --help
source deactivate

slurm

The --cluster-config specification has been extended several ways, e.g., https://github.com/Snakemake-Profiles/slurm.

Python functions

The mysterious expand() function can be explicitly exploited,

python3
>>> from snakemake.io import expand, glob_wildcards
>>> expand("{a}-{b}.tst",a=['a', 'b', 'c'],b=[1, 2, 3])
['a-1.tst', 'a-2.tst', 'a-3.tst', 'b-1.tst', 'b-2.tst', 'b-3.tst', 'c-1.tst', 'c-2.tst', 'c-3.tst']
>>> expand("{sample}_{id}.txt", zip, sample=["a", "b", "c"], id=["1", "2", "3"])
['a_1.txt', 'b_2.txt', 'c_3.txt']
>>> proteins = glob_wildcards('METAL/{metal}-chrX-1.tbl.gz').metal
>>> len(proteins)
987

Note the zip argument which prevents expanding every combinations.

Examples

Other variety is as follows.

1. hello world

wget -qO- https://github.com/snakemake/snakemake/archive/refs/tags/v7.12.0.tar.gz | \
tar xvfz -
cd snakemake-7.12.0/examples/c/src
snakemake -j4
hello

Hello makefiles!

as others from the GitHub/examples directory.

2. MRpipeline (MRcovid)

3. CVD1-HF analysis

4. DrugTargetMethodComparison

5. gwas-sumstats-harmoniser (Nextflow)

References

Köster J, Rahmann S: Snakemake–a scalable bioinformatics workflow engine. Bioinformatics. 2012, 28 (19): 2520-2; 2018, 34 (20): 3600

Molder F, et al. Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. F1000Research 2021, 10:33 (https://doi.org/10.12688/f1000research.29032.2)

Edwards D. Plant Bioinformatics-Methods and Protocols, 3e. Springer 2022. https://link.springer.com/book/10.1007/978-1-0716-2067-0. Chapter 9; Chapter 11.