I have a data.csv file like this
Col1,Col2,Col3,Col4,Col5
10,12,14,15,16
18,20,22,24,26
28,30,32,34,36
38,40,42,44,46
48,50,52,54,56
Col6,Col7
11,12
13,14
...
Now, I want to read only the data of columns Col1 to Col5 and I don't require Col6 and Col7.
I tried reading this file using
df = pd.read_csv('data.csv',header=0)
then its throwing an error saying
UnicodeDecodeError : 'utf-8' codec cant decode byte 0xb2 in position 3: invalid start byte
Then, I tried this
df = pd.read_csv('data.csv',header=0,error_bad_lines=True)
But this is also not giving the desired result. How can we read only till the first blank line in the csv file?
You can create a generator which reads a file line by line. The result is passed to pandas
:
import pandas as pd
import io
def file_reader(filename):
with open(filename) as f:
for line in f:
if line and line != '\n':
yield line
else:
break
data = io.StringIO(''.join(file_reader('data.csv')))
df = pd.read_csv(data)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With