I have a CSV file with ";" as separator made like this:
col1;col2;col3;col4
4;hello;world;1;1
4;hi;1;1
4;hi;1;1
4;hi;1;1
Obviously by using ";" as sep it gives me the error about tokenizing data (obviously from the header less columns are expected), how can i obtain a dataframe like this:
col1 col2 col3 col4
4 hello;world 1 1
4 hi 1 1
4 hi 1 1
4 hi 1 1
It could be read even with others packages and other data type (even if i prefer pandas because of the following operations in the code)
You could split off the outer cols until you are left with the remaining col2. This could be done in Pandas as follows:
import pandas as pd
df_raw = pd.read_csv("input.csv", delimiter=None, header=None, skiprows=1)
df_raw[['col1', 'rest']] = df_raw[0].str.split(";", n=1, expand=True)
df_raw[['col2', 'col3', 'col4']] = df_raw.rest.str.rsplit(";", n=2, expand=True)
df = df_raw[['col1', 'col2', 'col3', 'col4']]
print(df)
Giving df as:
col1 col2 col3 col4
0 4 hello;world 1 1
1 4 hi 1 1
2 4 hi 1 1
3 4 hi 1 1
First read in the CSV file without using any delimiters to get a single column.
Use a .str.split() with n=1 to split out just col1 using the ; delimiter from the left.
Take the remaining rest and apply .str.rsplit() with n=2 to do a reverse split using the ; delimiter to get the remaining columns. This allows col2 to have any have any characters.
This assume that only col2 can have additional ; separators and the last two are fixed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With