I have a dataframe, which containes a column with a string. it looks like :
[a]
aaa aa a aaaa
bbb bbb b
cc cccc ccc cc ccc
What I would like is to add 6 columns with spliting values of [a], like this :
[a] [a0] [a1] [a2] [a3] [a4] [a5]
aaa aa a aaaa aaa aa a aaaa NaN NaN
bbb bbb b bbb bbb b NaN NaN NaN
cc cccc ccc cc ccc cc cccc ccc cc ccc NaN
I use this code :
for i in range(6):
df["a{}".format(i)] = df[a].apply(lambda x:x.split(' ')[i])
but I have a 'out of range' error, which can be explain because all values have not the same number element.
How I can avoid this error, and replace all values in error by None ?
Thanks in advance. BR,
EDIT : we never know in advance the length of string to split. Something it contains 2 occurences, sometimes 4, etc..
You could use str.split and provide expand=True so that it enlarges into a dataframe for each of those individual splits.
Reindex these by providing an added range so that we can create an extra column with NaNs. Provide an optional prefix char later.
Then, concatentate the original and the extracted DF's column-wise.
str_df = df['a'].str.split(expand=True).reindex(columns=np.arange(6)).add_prefix('a')
pd.concat([df, str_df], axis=1).replace({None:np.NaN})

You're almost there :) All you have to do is to add the following small condition at the end of your current lambda function:
if len(x.split(" "))>i else None
Your code becomes:
for i in range(6):
df["a{}".format(i)] = df[a].apply(lambda x: x.split(' ')[i] if len(x.split(' ')>i else None)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With