Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read AWS S3 CSV Column Names in Lambda

I am trying to write a script that collects the schema from an AWS Aurora Serverless MySQL database table, collects the column headers from a CSV file stored in an AWS S3 bucket, and only writes the CSV to the table if its column headers are a subset of the schema (e.g., if the table fields are ['Name', 'DOB', 'Height'] but the CSV fields are ['Name', 'DOB', 'Weight'] the script will throw an exception.

Thus far, I have successfully returned the table schema in an AWS Lambda function, and successfully read in the CSV file, but I am not sure how to get the column headers from the S3 object.

def return_db_schema(event):
    schema = []
    conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
    with conn.cursor() as cur:
        cur.execute('SHOW columns FROM SampleTable')
        conn.commit()
        cur.close()
        for row in cur:
            schema.append(list(row)[0])
        return schema

def return_csv_cols(event):
    s3 = boto3.client('s3')
    tester = s3.get_object(Bucket=s3_bucket, Key=test_key)
    contents = tester['Body'].read()

def main(event, context):
    print(return_db_schema(event))
    print()
    print(return_csv_cols(event))

I am unsure how to proceed from here (e.g., is there a way to do this without loading the CSV to a pandas DataFrame and calling df.columns() or something like that?).

like image 824
OJT Avatar asked Jan 31 '26 03:01

OJT


1 Answers

I have solved this with the following code:

    s3 = boto3.client('s3')
    tester = s3.get_object(Bucket=s3_bucket, Key=test_key)
    contents = tester['Body'].read().decode('UTF-8')
    cols = contents.split('\n')[0].split(',')
    return cols, contents
like image 193
OJT Avatar answered Feb 02 '26 17:02

OJT