Is there a way I can have a task require the completion of multiple upstream tasks which are still able to finish independently?
create_dashboard should require load_fcr and load_survey to successfully complete.
I do not want to force anything in the 'survey' task chain to require anything from the 'fcr' task chain to complete. I want them to process in parallel and still complete even if one fails. However, the dashboard task requires both to finish loading to the database before it should start.
fcr *-->*-->*
             \
               ---> create_dashboard
                /
survey *-->*-->*
You can pass a list of tasks to set_upstream or set_downstream. In your case, if you specifically want to use set_upstream, you could describe your dependencies as:
create_dashboard.set_upstream([load_fcr, load_survey])
load_fcr.set_upstream(process_fcr)
process_fcr.set_upstream(download_fcr)
load_survey.set_upstream(process_survey)
process_survey.set_upstream(download_survey)
Have a look at airflow's source code: even when you pass just one task object to set_upstream, it actually wraps a list around it before doing anything.
download_fcr.set_downstream(process_fcr)
process_fcr.set_downstream(load_fcr)
download_survey.set_downstream(process_survey)
process_survey.set_downstream(load_survey)
load_survey.set_downstream(create_dashboard)
load_fcr.set_downstream(create_dashboard)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With