Python

Question

I have a dataframe, which containes a column with a string. it looks like :

[a]
aaa aa a aaaa
bbb bbb b
cc cccc ccc cc ccc

What I would like is to add 6 columns with spliting values of [a], like this :

[a]                     [a0]    [a1]    [a2]    [a3]    [a4]    [a5]
aaa aa a aaaa           aaa     aa      a       aaaa    NaN     NaN
bbb bbb b               bbb     bbb     b       NaN     NaN     NaN
cc cccc ccc cc ccc      cc      cccc    ccc     cc      ccc     NaN

I use this code :

for i in range(6):
     df["a{}".format(i)] = df[a].apply(lambda x:x.split(' ')[i])

but I have a 'out of range' error, which can be explain because all values have not the same number element.

How I can avoid this error, and replace all values in error by None ?

Thanks in advance. BR,

EDIT : we never know in advance the length of string to split. Something it contains 2 occurences, sometimes 4, etc..

Nickil Maveli · Accepted Answer

You could use str.split and provide expand=True so that it enlarges into a dataframe for each of those individual splits.

Reindex these by providing an added range so that we can create an extra column with NaNs. Provide an optional prefix char later.

Then, concatentate the original and the extracted DF's column-wise.

str_df = df['a'].str.split(expand=True).reindex(columns=np.arange(6)).add_prefix('a')
pd.concat([df, str_df], axis=1).replace({None:np.NaN})

enter image description here

mircealungu · Answer

You're almost there :) All you have to do is to add the following small condition at the end of your current lambda function:

if len(x.split(" "))>i else None

Your code becomes:

for i in range(6):
     df["a{}".format(i)] = df[a].apply(lambda x: x.split(' ')[i] if len(x.split(' ')>i else None)

Python - split string into multiples columns [duplicate]

Tags:

string

split

pandas

Cascador84

2 Answers

Nickil Maveli

mircealungu

Recent Activity

Donate For Us