My CSV data looks like this:
ID;name;info
1;ABC;text1
2;DEF;text2;text3
3;GHI;text4;
4;JKL;text5;text6;text7
There are 3 named columns. The additional unnamed columns all relate to the last one (info), and the amount of those additional columns is not known.
Using df=pd.read_csv(filename, delimiter=";", dtype=object) returns a "Error tokenizing data. C error..." due to irregular shape.
Is it possible to merge the last columns into one column containing a list, to achieve the result below?
ID;name;info
1;ABC;[text1]
2;DEF;[text2, text3]
3;GHI;[text4]
4;JKL;[text5, text6, text7]
Here is a general way where we count the number of delimiters in the columns and based on that construct the dataframe:
data = pd.read_csv("text.csv")
n_sep = data.columns[0].count(";")
headers = data.columns.str.split(";")[0]
data[headers] = data.iloc[:, 0].str.split(";", n=n_sep, expand=True)
data = data.iloc[:, 1:].assign(info=data['info'].str.split(";"))
ID name info
0 1 ABC [text1]
1 2 DEF [text2, text3]
2 3 GHI [text4, ]
3 4 JKL [text5, text6, text7]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With