Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow - retrieve files from Windows shared folders?

Tags:

python

airflow

What is the best method to grab files from a Windows shared folder on the same network?

Typically, I am extracting data from SFTPs, SalesForce, or database tables, but there are a few cases where end-users need to upload a file to a shared folder that I have to retrieve. My process up to now has been to have a script running on a Windows machine which just grabs any new/changed files and loads them to an SFTP, but that is not ideal. I can't monitor it in my Airflow UI, I need to change my password on that machine physically, mapped network drives seem to break, etc.

Is there a better method? I'd rather the ETL server handle all of this stuff.

  • Airflow is installed on remote Linux server (same network)
  • Windows folders are just standard UNC paths where people have access based on their NT ID. These users are saving files which I need to retrieve. These users are non-technical and did not want WinSCP installed to share the data through an SFTP instead or even a Sharepoint (where I could use Shareplum, I think).
  • I would like to avoid mounting these folders and instead use Python scripts to simply copy the files I need as per an Airflow schedule
  • Best if I can save my NT ID and password within an Airflow connection to access it with a conn_id
like image 630
trench Avatar asked Jan 28 '26 18:01

trench


1 Answers

If I'm understanding the question correctly, you have a shared folder mounted on your local machine — not the Windows server where your Airflow install is running. Is it possible to access the shared folder on the server instead?

I think a file sensor would work your use case.

If you could auto sync the shared folder to a cloud file store like S3, then you could use the normal S3KeySensor and S3PrefixSensor that are commonly used . I think this would simplify your solution as you wouldn't have to be concerned with whether the machine(s) the tasks are running on has access to the folder.

Here are two examples of software that syncs a local folder on Windows to S3. Note that I haven't used either of them personally.

  • https://www.cloudberrylab.com/blog/how-to-sync-local-folder-with-amazon-s3-bucket-with-cloudberry-s3-explorer/
  • https://s3browser.com/amazon-s3-folder-sync.aspx

That said, I do think using FTPHook.retrieve_file is a reasonable solution if you can't have your files in cloud storage.

like image 53
Taylor D. Edmiston Avatar answered Jan 30 '26 09:01

Taylor D. Edmiston



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!