I need to setup an AWS Lambda function that triggers when new CSV files are uploaded to an S3 bucket to merge the CSV files into one Master file (they will have the same number of columns and column names), then that new Master file is uploaded to another S3 bucket.
I'm using Python for the Lambda function. I created a zip folder with my Lambda function and the dependencies I used (Pandas and Numpy) and uploaded that.
Currently I have to include the CSV files that I want merged together in the zip folder itself, the function merges those CSV files and the output (Master file) is in the logs, when I check in CloudWatch.
I don't know how to link my code to the S3 buckets for input and output.
This is for an app I'm working on.
here's the python code I'm using:
import os
import glob
import numpy
import pandas as pd
def handler(event, context):
#find all csv files in the folder
#use glob pattern matching -> extension = 'csv'
#save result in list -> all_filenames
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
#export to csv
combined_csv.to_csv( "/tmp/combined_csv.csv", index=False, encoding='utf-8-sig')
f = open("/tmp/combined_csv.csv", "r")
print(f.read())
f.close()
I would like to not have to manually input the CSV files in the same zip folder as my python script every time, and also have the output Master CSV file be in a separate S3 bucket.
I would recommend that you do this using Amazon Athena.
CREATE EXTERNAL TABLE to define the input location in Amazon S3 and formatCREATE TABLE AS to define the output location in Amazon S3 and format (CSV Zip), with a query (eg SELECT * FROM input-table)This way, there is no need to download, process and upload the files. It will all be done by Amazon Athena. Plus, if the input files are compressed, the cost is lower because Athena is charged based upon the amount of data read from disk.
You could call Amazon Athena from the AWS Lambda function. Just make sure it only calls Athena after all the input files are in place.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With