Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Airflow changing start_date without renaming dag?

I am a data engineer and work with airflow regularly.

When redeploying dags with a new start date the best practice is as shown in the here:

Don’t change start_date + interval: When a DAG has been run, the scheduler database contains instances of the run of that DAG. If you change the start_date or the interval and redeploy it, the scheduler may get confused because the intervals are different or the start_date is way back. The best way to deal with this is to change the version of the DAG as soon as you change the start_date or interval, i.e. my_dag_v1 and my_dag_v1. This way, historical information is also kept about the old version.

However after deleting all previous DAG and task runs I tried to redeploy a dag with a new start date. It worked as expected (with the new start date) for a day, then started to work with the old again

What are the reasons for this? In depth if you can.

like image 383
scr Avatar asked Oct 19 '25 13:10

scr


1 Answers

Airflow maintains all of the information regarding the past runs in a table dag_run.

When you clear the previous dag runs, these entries are dropped from the database. Hence, airflow treats this dag as a new dag and starts at the specified time.

Airflow checks the last dag execution time (start_date of last run) and adds the timedelta object which you have specified in schedule_interval.

If you are having difficulties even after clearing dag runs, few things you can do:

  1. Rename the dag as suggested.
  2. Clear all the dag runs, keep the dag paused. Create a dag run and then turn the dag on. It will run on the scheduled time afterwards.
  3. The best approach would be to use crontab expression inside schedule_interval.
like image 164
Nitin Pandey Avatar answered Oct 22 '25 03:10

Nitin Pandey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!