Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Postgres Package: psycopg2 copy_from vs copy_expert

Requirement: To load millions of rows into a table from S3 using Python and avoid memory issue

I see there are two methods psycopg2's copy_from and copy_expert.

Which of these are most efficient and avoid memory issue

Also, I see that Redshift(Which is Postgres) support COPY Command to load data from S3 file but not sure if Postgres DB support such feature

like image 343
Kar Avatar asked Mar 15 '26 19:03

Kar


1 Answers

My implementation changing copy_from to copy_expert. Extensive analysis of PostgreSQL load can be found here: https://hakibenita.com/fast-load-data-python-postgresql.

COPY_FROM

def insert_with_string_io(df: pd.DataFrame, table_name: str):
        buffer = io.StringIO()
        df.to_csv(buffer, index=False, header=False)
        buffer.seek(0)
        with conn.cursor() as cursor:
            try:
                cursor.copy_from(file=buffer, table=table_name, sep=",", null="")
            except (Exception, psycopg2.DatabaseError) as error:
                print("Error: %s" % error)

COPY_EXPERT

def insert_with_string_io(df: pd.DataFrame):
        buffer = io.StringIO()
        df.to_csv(buffer, index=False, header=False)
        buffer.seek(0)
        with conn.cursor() as cursor:
            try:
                cursor.copy_expert(f"COPY <database>.<schema>.<table> FROM STDIN (FORMAT 'csv', HEADER false)" , buffer)
            except (Exception, psycopg2.DatabaseError) as error:
                print("Error: %s" % error)
like image 67
Hale4029 Avatar answered Mar 19 '26 09:03

Hale4029



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!