I have a data set including the user ID, item ID(both string) and rating like that:
A12VH45Q3H5R5I B000NWJTKW 5.0
A3J8AQWNNI3WSN B000NWJTKW 4.0
A1XOBWIL4MILVM B000NWJTKW 1.0
I'd like to change the IDs to integer like:
1 1 5.0
2 1 4.0
3 1 1.0
I have tried a traditional way, creating a big dictionary and mark each string ID with an integer one. But it took extremely long time. So could you please tell me how to finish it in a more fast way? Thanks in advance.
You could also encode the column as a categorical and then get the codes.
df['User_ID_code'] = df.User_ID.astype('category').cat.codes
>>> df
User_ID Item_ID Rating User_ID_code
0 A12VH45Q3H5R5I B000NWJTKW 5 0
1 A3J8AQWNNI3WSN B000NWJTKW 4 2
2 A1XOBWIL4MILVM B000NWJTKW 1 1
You can apply factorize:
In [244]:
df[[0,1]] = df[[0,1]].apply(lambda x: pd.factorize(x)[0] + 1)
df
Out[244]:
0 1 2
0 1 1 5
1 2 1 4
2 3 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With