I'm using the following code to read a CSV file in chunks using pandas read_csv
headers = ["1","2","3","4","5"]
fields = ["1", "5"]
for chunk in pandas.read_csv(fileName, names=headers, header=0, usecols=fields, chunksize=chunkSize):
Sometimes my CSV won't have column "5" and I want to be able to handle this case and specify some default values. Is there a way to read just the headers of my CSV file without reading the whole file so I can handle this manually? Or may be any other clever way to default the value for the missing column?
If you pass nrows=0 this reads just the column row, you can then call intersection to find the common column values and avoid any errors:
In[14]:
t="""1,2,3,5,6
0,1,2,3,4"""
headers = ["1","2","3","4","5"]
fields = ["1", "5"]
cols = pd.read_csv(io.StringIO(t), nrows=0).columns
cols
Out[14]: Index(['1', '2', '3', '5', '6'], dtype='object')
So now we have column names we can call intersection to find the valid columns against your expected and actual columns:
In[15]:
valid_cols = cols.intersection(headers)
valid_cols
Out[15]: Index(['1', '2', '3', '5'], dtype='object')
You can do the same with fields and then you can pass these to your current code to avoid any exceptions
Just to demonstrate that passing nrows=0 just reads the header row:
In[16]:
pd.read_csv(io.StringIO(t), nrows=0)
Out[16]:
Empty DataFrame
Columns: [1, 2, 3, 5, 6]
Index: []
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With