In order to improve the distribution of my snakemake workflow, I am wanting to Docker-ize it. The way I usually deploy my software stack is through conda environments (saved as .yaml files within an envs/ directory of my project folder). They are then called by each rule through the conda: directive and --use-conda flag (at execution), for example:
rule exampleRule:
input: input.file
output: output.file
conda:
os.path.join(workflow.basedir, "envs/<nameOfEnv>.yaml")
shell: "some shell command"
Following the snakemake documentation, I am aware that I can Dockerize my full workflow using the command: snakemake --containerize.
The resulting Dockerfile is something like this (this is a contrived example, in reality I have seven environments that are all built inside this container):
FROM condaforge/mambaforge:latest
LABEL io.github.snakemake.containerized="true"
LABEL io.github.snakemake.conda_env_hash="dc12f3c8b1fb3caed02ab3305e24859bd63f4f1ea0c1ed29d71c857e7d0baaf5"
# Step 1: Retrieve conda environments
# Conda environment:
# source: envs/nameOfEnvironment.yaml
# prefix: /conda-envs/4e57ed29df8b6f849000ab15b5c719f2
# channels:
# - conda-forge
# - bioconda
# - anaconda
# - defaults
# dependencies:
# - wget
# - packageX == 5.3.0
# - packageY == 2.5.5
# - packageZ >= 1.1.1
RUN mkdir -p /conda-envs/4e57ed29df8b6f849000ab15b5c719f2
COPY envs/nameOfEnvironment.yaml /conda-envs/4e57ed29df8b6f849000ab15b5c719f2/environment.yaml
# Step 2: Generate conda environments
RUN mamba env create --prefix /conda-envs/4e57ed29df8b6f849000ab15b5c719f2 --file /conda-envs/4e57ed29df8b6f849000ab15b5c719f2/environment.yaml && \
mamba clean --all -y
I have already succeeded in following this method, and built an image that I host on Dockerhub that is then referenced in the Snakefile, e.g. container: "docker://myDockerHub/myWorkflow_dockerimage:latest"
Then, when executing with the --use-singularity flag, snakemake will pull this image and build the environments.
The problem is that the rest of the execution seems to follow a similar procedure as if I had not used singularity/docker whatsoever.
Meaning that in order to run the workflow in this way, I still require some base packages being installed, most especially Snakemake itself. Doesn't this sort of defeat the purpose of containerizing the workflow in the first place, or do I misunderstand something fundamental?
The only real solution I can think of is to create an image/container with Snakemake installed. Then I would run this image using singularity - but from then on I would be running the workflow from within the snakemake image as if I had not containerized the environments themselves. When executing snakemake from within this image, I cannot use the --use-singularity flag, as singularity itself is not available from within the image.
It seems completely counterintuitive to run snakemake without the --use-singularity flag, when my intention is to utilize singularity.
As noted in the comments, the root of the problem here is that you were expecting Snakemake to containerize itself. That is not the purpose of Snakemake's containerization features -- they exist to containerize your pipeline. Someone who wants to run your pipeline still needs to install Snakemake (and Singularity) manually. After they have done that, when they run your pipeline, all the code within the pipeline will run in the specified containers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With