Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

csv module returning a BOM for first column

I have a csv file formatted like this:

type,type_mapping, style,style_mapping,Count
Residential,Residential,Antique,Antique,109
Antique,Residential,Antique,Antique,48
Apt/Garage,Commercial,Apt/Garage,Apartment,1

I am parsing it using the csv module in Python (version 3). Here is my code:

import os
import csv

typeXref = dict()
with open('xref.csv') as csvData:
    csvRead = csv.reader(csvData)
    headers = next(csvRead)

    for index, row in enumerate(csvRead):
        typeXref[index] = {key: value for key, value in zip(headers, row)} 

print(typeXref)

For some reason my first column continually returns the byte order mark \ufefffor the first column in the header.

408: {'\ufefftype': 'Residential', 'type_mapping': 'Residential', 
      ' style': 'Antique', 'style_mapping': 'Antique', 'Count': '109'}}

I assume this is due to the way I'm opening the file, reading the content with the csv module, or generating the file.

I can figure out how to decode that one field, but would rather ensure I'm generating the file correctly, or using the csv module property.

like image 628
Dom DaFonte Avatar asked Dec 02 '25 07:12

Dom DaFonte


1 Answers

You have to tell that you are reading an utf-8 file with BOM:

with open('xref.csv', encoding='utf-8-sig') as csvData:
    ....

Then the BOM will be stripped

like image 123
Guillaume Lebreton Avatar answered Dec 04 '25 21:12

Guillaume Lebreton