Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Python copy_from with different column count on source and destination

I am trying a new ETL process using Python from CSV source to PostgreSQL database.

I already made the table for the destination. However, I have create_at column at my table on the database with CURRENT_DATE as default value. On the other hand, I don't have create_at column on the CSV file.

The WP_SALES table at the database consist of:

id (int) PK
order_date (timestamp)
order_status (character varying)
customer_id (smallint)
product (character varying)
product_category (character varying)
quantity (smallint)
total_price (money)
create_at (date) DEFAULT CURRENT_DATE

While on the CSV, it consist of:

id 
order_date 
order_status 
customer_id 
product 
product_category 
quantity 
total_price

And here is the code I tried:

import psycopg2
conn = psycopg2.connect ("host=localhost dbname=postgres user=postgres  port=5432")
cur = conn.cursor()
with open('[Technical Test - Data Engineer] Sale Report - wp.csv', 'r') as source:
    next(source)
    cur.copy_from(source, 'public."WP_SALES"', sep=',')

conn.commit()

I expect the output would be all of the data on the CSV loaded into the table, with the created_at column filled with its default value (CURRENT_DATE).

What I get is this error:

Traceback (most recent call last):
  File "D:\Warung Pintar\TESTQuery", line 8, in <module>
  cur.copy_from(source, 'public."WP_SALES"', sep=',')
psycopg2.DataError: missing data for column "create_at"
CONTEXT:  COPY WP_SALES, line 1:     "127530,2018-10-20T03:41:14,sale,1645,ABC001,Minuman Sachet,2,19400"

[Finished in 0.2s]

I hope to solve the problem without tweaking the CSV file.

Thanks in advance.

like image 469
Mohammad Iqbal Avatar asked Oct 21 '25 19:10

Mohammad Iqbal


1 Answers

As mentioned in the psycopg2 documentation copy_from() has a named parameter for columns

columns – iterable with name of the columns to import. The length and types should match the content of the file to read. If not specified, it is assumed that the entire table matches the file structure.

So the follow should be what you need

cur.copy_from(source, 'public."WP_SALES"', sep=',', columns=['id', 'order_date', 'order_status', 'customer_id', 'product', 'product_category', 'quantity', 'total_price'])
like image 89
badger0053 Avatar answered Oct 23 '25 08:10

badger0053



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!