I am working with a poorly-formed CSV file; it has duplicate fieldnames.
csv.DictReader just overwrites the first column with the same name with the contents of the second column with the same name. But I need both contents of columns with duplicate name.
I can't assign the DictReader.fieldnames parameter directly. There are about one hundred columns and every time it would be different number of columns, e.g.:
product, price1, price2, price1,...,price100
car, 100, 300, 200,...,350
output: {'product':'car', 'price1': 200, 'price2':300}
I need: {'product':'car', 'price1': 100, 'price2':300, 'price3': 200}
What is the way to do it?
csv. Reader() allows you to access CSV data using indexes and is ideal for simple CSV files. csv. DictReader() on the other hand is friendlier and easy to use, especially when working with large CSV files.
If you need to re-read the file, you can either close it and re-open it, or seek() to the beginning, i.e. add ordersFile. seek(0) between your loops.
Because the csv document is treated as tabular data the header can not contain duplicate entries. If the header contains duplicates an exception will be thrown on usage.
Don't use a DictReader() in this case. Stick to a regular reader instead.
You can always map to a dictionary based on a re-mapped list of fieldnames:
with open(filename, 'rb') as csvfile:
    reader = csv.reader(csvfile)
    fieldnames = remap(next(reader))
    for row in reader:
        row = dict(zip(fieldnames, row))
where the remap() function could either renumber your numbered columns or append extra information if column names are duplicated.
Re-numbering could be as easy as:
from itertools import count
def remap(fieldnames):
    price_count = count(1)
    return ['price{}'.format(next(price_count)) if f.startswith('price') else f
            for f in fieldnames]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With