Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas column of list: How to set the dtype of items

Tags:

python

pandas

I have a dataframe which has multiple columns containing lists and the length of the lists in each row are different:

tweetid tweet_date    user_mentions       hashtags
00112   11-02-2014    []                  []
00113   11-02-2014    [00113]             [obama, trump]
00114   30-07-2015    [00114, 00115]      [hillary, trump, sanders]
00115   30-07-2015    []                  []

The dataframe is a concat of three different dataframes and I'm not sure whether the items in the lists are of the same dtype. For example, in the user_mentions column, sometime the data is like:

[00114, 00115]

But sometimes is like this:

['00114','00115'] 

How can I set the dtype for the items in the lists?

like image 489
msmazh Avatar asked Nov 16 '25 01:11

msmazh


2 Answers

Pandas DataFrames are not really designed to house lists as row/column values, so this is why you are facing difficulty. you could do

python3.x:

df['user_mentions'].apply(lambda x: list(map(int, x)))

python2.x:

df['user_mentions'].apply(lambda x: map(int, x))

In python3 when mapping a map object is returned so you have to convert to list, in python2 this does not happen so you don't explicitly call it a list.

In the above lambda, x is your row list and you are mapping the values to int.

like image 115
d_kennetz Avatar answered Nov 17 '25 15:11

d_kennetz


df['user_mentions'].map(lambda x: ['00' + str(y) if isinstance(y,int) else y for y in x]) If your objective is to convert all user_mentions to str the above might help. I would also look into this post for unnesting. As mentioned ; pandas not really designed to house lists as values.

like image 28
Francisco Avatar answered Nov 17 '25 15:11

Francisco