Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change the string in data frame to integer ID with pandas fast?

I have a data set including the user ID, item ID(both string) and rating like that:

A12VH45Q3H5R5I B000NWJTKW 5.0
A3J8AQWNNI3WSN B000NWJTKW 4.0
A1XOBWIL4MILVM B000NWJTKW 1.0

I'd like to change the IDs to integer like:

1              1          5.0
2              1          4.0
3              1          1.0

I have tried a traditional way, creating a big dictionary and mark each string ID with an integer one. But it took extremely long time. So could you please tell me how to finish it in a more fast way? Thanks in advance.

like image 610
user5779223 Avatar asked Oct 28 '25 03:10

user5779223


2 Answers

You could also encode the column as a categorical and then get the codes.

df['User_ID_code'] = df.User_ID.astype('category').cat.codes
>>> df
          User_ID     Item_ID  Rating  User_ID_code
0  A12VH45Q3H5R5I  B000NWJTKW       5             0
1  A3J8AQWNNI3WSN  B000NWJTKW       4             2
2  A1XOBWIL4MILVM  B000NWJTKW       1             1
like image 151
Alexander Avatar answered Oct 29 '25 18:10

Alexander


You can apply factorize:

In [244]:
df[[0,1]] = df[[0,1]].apply(lambda x: pd.factorize(x)[0] + 1)
df

Out[244]:
   0  1  2
0  1  1  5
1  2  1  4
2  3  1  1
like image 23
EdChum Avatar answered Oct 29 '25 17:10

EdChum



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!