Is dvc.yaml supposed to be written or generated by dvc run command?

Question

Trying to understand dvc, most tutorials mention generation of dvc.yaml by running dvc run command.

But at the same time, dvc.yaml which defines the DAG is also well documented. Also the fact that it is a yaml format and human readable/writable would point to the fact that it is meant to be a DSL for specifying your data pipeline.

Can somebody clarify which is the better practice? Writing the dvc.yaml or let it be generated by dvc run command? Or is it left to user's choice and there is no technical difference?

Jorge Orpinel Pérez · Accepted Answer

I'd recommend manual editing as the main route! (I believe that's officially recommended since DVC 2.0)

dvc stage add can still be very helpful for programmatic generation of pipelines files, but it doesn't support all the features of dvc.yaml, for example setting vars values or defining foreach stages.

casper.dcl · Answer

Both, really.

Primarily dvc run (or the newer dvc stage add followed by dvc exp run) is meant to mange your dvc.yaml file. For most (including casual) users, this is probably easiest & thus best. The format will be guaranteed to be correct (similar to choosing between {git,dvc} config and directly modifying .{git,dvc}/config)

However as you note, dvc.yaml is human-readable. This is intentional so that more advanced users could manually edit the YAML (potentially short-circuiting some validation checks, or unlocking advanced functionality such as foreach stages).

Is dvc.yaml supposed to be written or generated by dvc run command?

Tags:

directed-acyclic-graphs

data-pipeline

dvc

rajeshnair

2 Answers

Jorge Orpinel Pérez

casper.dcl

Recent Activity

Donate For Us

Is dvc.yaml supposed to be written or generated by dvc run command?

Tags:

directed-acyclic-graphs

data-pipeline

dvc

rajeshnair

2 Answers

Jorge Orpinel Pérez

casper.dcl

Related questions

Recent Activity

Donate For Us