I'm currently trying to drop duplicates according to two columns, but count the duplicates before they are dropped. I've managed to do this via
df_interactions = df_interactions.groupby(['user_id','item_tag_ids']).size().reset_index() \
.rename(columns={0:'interactions'})
but this leaves me with
user_id item_tag_ids interactions
0 170 71 1
1 170 325 1
2 170 387 1
3 170 474 1
4 170 526 2
It does what I want with respect to counting, adding as a column and dropping the duplicates but how would I do this with retaining the original structure (plus a new column). Adding more to groupby
changes its behaviour.
Here is the original structure, I only want to group by IDs:
user_id item_tag_ids item_timestamp
0 406225 7271 1483229353
1 406225 1183 1483229350
2 406225 5930 1483229350
3 406225 7162 1483229350
4 406225 7271 1483229350
I would like to have the new item_timestamp
field in the smaller dataframe to contain the first occurring timestamp for that combination.
You want to use transform
like the following to keep your original data's shape.
And to get a list of the values of all the item_stamps you can use groupby
in combination with agg(list)
# First we create count column with transform
df['count'] = df.groupby(['user_id', 'item_tag_ids']).user_id.transform('size')
# AFter that we merge our groupby with apply list back to our original dataframe
df = df.merge(df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index(),
on=['user_id', 'item_tag_ids'],
how='left',
suffixes=['_1', '']).drop('item_timestamp_1', axis=1)
print(df)
user_id item_tag_ids count item_timestamp
0 406225 7271 2 [1483229353, 1483229350]
1 406225 1183 1 [1483229350]
2 406225 5930 1 [1483229350]
3 406225 7162 1 [1483229350]
4 406225 7271 2 [1483229353, 1483229350]
Explanation of .agg(list)
it aggregates the values of the group to a list like the following:
df.groupby(['user_id', 'item_tag_ids']).item_timestamp.agg(list).reset_index()
Out[39]:
user_id item_tag_ids item_timestamp
0 406225 1183 [1483229350]
1 406225 5930 [1483229350]
2 406225 7162 [1483229350]
3 406225 7271 [1483229353, 1483229350]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With