I am writing a Snakefile for a snakemake workflow. As part of my workflow I need to check whether a set of records in a database has changed, and if they have re-download them.
My thought was to write a rule that checks the database timestamp and writes it to an output file. And use the timestamp file as an input into my download rule. The problem is once the timestamp file is written that timestamp rule will never run again, and hence the timestamp will never be updated.
Is there a way to make this rule run every time. (I know I can force it from the shell, but I would like to specify it in the Snakefile) Or, is there a better way to handle this?
Any code you add to a Snakefile outside of a rule or function definition will be run at startup just like a regular Python script, so you don't need an external shell script. You can implement the logic you want in Python right in the Snakefile, making use of the shell() function if you need it.
One caveat would be that if you tried to run your workflow on a cluster, the code would be run each time for each cluster job submitted. A crude but effective way to avoid this is to guard it with a check like this:
if '--nolock' not in sys.argv:
    if check_database_for_updates():
        os.utime('touch.file')
Then set touch.file as a proxy input to your rule that reads from the database. Does that make sense?
TIM
Since v3.6.0, onstart handler allows to always execute something before the workflow starts.
Snakemake 3.6.0 adds an
onstarthandler, that will be executed before the workflow starts. Note that dry-runs do not trigger any of the handlers.
It's unfortunate that onstart doesn't get triggered during dry-runs. 
On similar note, onsuccess and onerrorhandlers can be used to trigger something to be executed depending on workflow's success and error, respectively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With