Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow Xcom: How to cast byte array for value into text or json text in SQL?

Tags:

airflow

I'm investigating which data processing jobs are taking longer over their respective use over time (for installations of our system where it's been running for many months). The sizes of the data files it processes varies in size up to a few orders in magnitude, so I want to normalize the comparison between the processing times, and the number of records in the payload which is locked inside an XCOM variable value.

I would like to build a SQL view that I can use to correlate the processing duration (end-start), vs. file size vs. execution date, to see how stable the processing is over it's life cycle.

In documentation online, there's examples about serializing into JSON for Python, but, our metadata store for Airflow is in PostGres, and I want to create a SQL view that can provide metrics that associated statistics from processing the dags/tasks and associate metadata from the processing itself nested inside XCOM values.

Does anyone now how to cast XCOM byte value into something parseable in PostGres SQL?

enter image description here

like image 388
Mark Avatar asked Oct 27 '25 13:10

Mark


1 Answers

I'm facing the same issue. After digging through airflow source, found this:

https://github.com/apache/airflow/blob/2bea3d74952d0d68d90e8bbc307ac3dfe8fcf2ff/airflow/models/xcom.py#L221]

When setting an XCOM variable in the database it will serialize it. In airflow.cfg there is a setting enable_xcom_pickling = True.

if conf.getboolean('core', 'enable_xcom_pickling'):
    return pickle.dumps(value)

Looks like the byte array is getting pickled and then stored. This is annoying because I don't think there is a way to unpickle the byte array straight from postgres.

There is also another flag you can set called donot_pickle = False. Not sure what this does yet - looking into it more

like image 62
Michael Moroch Avatar answered Oct 30 '25 13:10

Michael Moroch