I'm running experiments on a model, with a workflow like this:
I'm using Git and Scientific Reproducibility as a guide , where the results of an experiment are stored in a table along the hash of the commit. I would like to store the results in a directory instead, naming the directories as hashes.
Thinking about version control, I would like to isolate the code and analysis. For example, a change of the color in a plot in a IPython notebook in analysis shouldn't change anything in code
The approach I'm thinking:
A directory structure like this:
model
- code
- simulation_results
- a83bc4
- 23e900
- etc
- analysis
and different Git repositories for code and analysis, leaving simulation_results out of Git.
Any comments? A better solution? Thanks.
That seems sound, and your structure would be a good fit for using git submodules, model becoming a parent git repo.
That way, you will link together code, and analysis SHA1 within the model repo.
That means you can create your directory within the private (ie not versioned) directory model/simulation_results based on the SHA1 of model repo (the "parent" repo): that SHA1 links the SHA1 of both project and analysis submodules, which means you can reproduce the experiment exactly (based on the exact content of both project and analysis).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With