Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge two CSV files based on matching values in different columns and keep unmatched rows with placeholders?

I'm working on a data cleaning task and could use some help. I have two CSV files with thousands of rows each:

File A contains product shipment records. File B contains product descriptions and categories. Here’s a simplified example:

File A (shipments.csv):

shipment_id,product_code,quantity,date
S001,P123,10,2025-07-01
S002,P456,5,2025-07-02
S003,P789,8,2025-07-03 

File B (products.csv):

product_code,description,category
P123,Widget A,Tools
P456,Widget B,Hardware

I want to create a merged file where each row from File A is enriched with the matching product description and category from File B (based on product_code). If there's no match, I’d like to keep the row from File A and fill the missing columns with "N/A".

Expected Output:

shipment_id,product_code,quantity,date,description,category
S001,P123,10,2025-07-01,Widget A,Tools
S002,P456,5,2025-07-02,Widget B,Hardware
S003,P789,8,2025-07-03,N/A,N/A

I tried using pandas.merge() in Python but it drops unmatched rows unless I use how='left', and I’m not sure how to fill missing values properly.

Any help? Thanks in advance!


1 Answers

As tagged, using awk (note the order of the files):

awk -F , -v OFS=, '
  NR == FNR {a[$1] = $2 OFS $3; next}
  {print $0, ($2 in a? a[$2]: "N/A" OFS "N/A")}
' products.csv shipments.csv 
shipment_id,product_code,quantity,date,description,category
S001,P123,10,2025-07-01,Widget A,Tools
S002,P456,5,2025-07-02,Widget B,Hardware
S003,P789,8,2025-07-03,N/A,N/A
like image 96
pmf Avatar answered Oct 24 '25 13:10

pmf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!