How to write only a subset of fields using DictReader and DictWriter

Question

CSV testfile called Sample.csv contains:

Brand, Price, Weight, Type
brand1, 6.05, 3.2, orange
brand2, 8.05, 5.2, orange
brand3, 6.54, 4.2, orange
brand1, 6.05, 3.2, pear
brand2, 7.05, 3.6, pear
brand3, 7.45, 3.9, pear
brand1, 5.45, 2.7, apple
brand2, 6.05, 3.2, apple
brand3, 6.43, 3.5, apple
brand4, 7.05, 3.9, apple
brand1, 8.05, 4.2, plum
brand2, 3.05, 2.2, plum

My code is:

import csv
headers = ['Brand','Price','Type']

with open('sample.csv', newline='') as rf:
    reader = csv.DictReader(rf, delimiter=',',fieldnames=headers)
    with open('output.csv', 'w', newline='') as wf:
        writer = csv.DictWriter(wf, delimiter=',', extrasaction='ignore', fieldnames=headers)
        writer.writerow(dict((fn,fn) for fn in writer.fieldnames))
        for row in reader:
           print(row)
           writer.writerow(row)

I'm only trying to get Brand, Price, Type into the output file but am getting:

Brand,Price,Type
Brand, Price, Weight
brand1, 6.05, 3.2
brand2, 8.05, 5.2
brand3, 6.54, 4.2
brand1, 6.05, 3.2
brand2, 7.05, 3.6
brand3, 7.45, 3.9
brand1, 5.45, 2.7
brand2, 6.05, 3.2
brand3, 6.43, 3.5
brand4, 7.05, 3.9
brand1, 8.05, 4.2
brand2, 3.05, 2.2

Why am I getting the Weight field in the output, not the Type field?

Note that the line writer.writerow(dict((fn,fn) for fn in writer.fieldnames)) was added for debugging, intentially printing out the header twice.

Todor Minakov · Accepted Answer

The fact that you have entered the column names Brand, Price, Weight in headers will not make DictReader parse and return just those columns from the source file. It will just assign the columns it finds in the file to those keys - effectively, you could name them First, Second, Third, and the result will be the same.

You have two options:

Use the source header row

Use the fact that you have a header row, let DictReader name the keys based on it, and write just the ones you want:

import csv
headers = ['Brand', 'Price', 'Type']

with open('sample.csv') as rf:
    reader = csv.DictReader(rf, delimiter=',', skipinitialspace=True)
    with open('output.csv', 'w') as wf:
        writer = csv.DictWriter(wf, delimiter=',', extrasaction='ignore', fieldnames=headers)
        writer.writeheader()
        for row in reader:
            print(row)
            writer.writerow(row)

Explicitly set the columns (in source & output)

Be explicit what are the columns in the source, and explicit what you want in the output - use two lists:

import csv
present_headers = ['Brand', 'Price', 'Weight', 'Type']
desired_headers = ['Brand', 'Price', 'Type']

with open('sample.csv') as rf:
    reader = csv.DictReader(rf, delimiter=',', fieldnames=present_headers, skipinitialspace=True)
    with open('output.csv', 'w') as wf:
        writer = csv.DictWriter(wf, delimiter=', ', extrasaction='ignore', fieldnames=desired_headers)
        writer.writeheader()
        next(reader, None)   #to skip writing the header row from the source
        for row in reader:
            # print(row)
            writer.writerow(row)

Given the sample in your question, both options produce equivalent results.

Couple of remarks - the sample in the question has the delimiter , followed by space - both in the headers, and in the data rows. If it is really there (not a copy & paste mistake), the option skipinitialspace=True in the reader will get rid of it - otherwise it becomes part of the data in the column.

The other is there's no need to manually construct the header row for the DictWriter - that's what the writerheader() method is for.

Finally, you can see in option 2 there's the call next(reader, None) just before writing the data - its purpose is for the reader to yield the first row, as it is now just a normal (not header) one, and we don't want to duplicate it in the output.

How to write only a subset of fields using DictReader and DictWriter

Tags:

python

csv

pbradley2525

1 Answers

Use the source header row

Explicitly set the columns (in source & output)

Todor Minakov

Recent Activity

Donate For Us

How to write only a subset of fields using DictReader and DictWriter

Tags:

python

csv

pbradley2525

1 Answers

Use the source header row

Explicitly set the columns (in source & output)

Todor Minakov

Related questions

Recent Activity

Donate For Us