I have a pandas data frame with session ID's, URL and TimeStamp in the following format:
SessionId TimeStamp URL
aa420858 20:24 url1
aa420858 20:26 url2
aa420858 20:27 url3
bb779bc3 18:18 other_url1
bb779bc3 18:21 other_url2
bb779bc3 18:24 other_url3
bb779bc3 18:25 other_url4
zz920853 20:27 diff_url1
zz920853 20:28 diff_url2
I need to get the following format:
SessionId URL1 URL2 URL3 URL4 TimeStamp1 TimeStamp2 TimeStamp3 TimeStamp4
aa420858 url1 url2 url3 20:26 20:27 20:27
bb779bc3 other_url1 other_url2 other_url3 other_url4 18:18 18:21 18:24 18:25
zz920853 diff_url1 diff_url2 20:27 20:28
I never know in advance a number of url's per session.
I tried to use pd.melt, pd.pivot_table, pivot(), unstack() and so on, but not successful. Can someone please advise the best approach. Also, would it be possible to use difference in time stamps to get time on a page?
Thank you very much!
pivot_table+ concat
df1=df.pivot_table(index='SessionId',columns=df.groupby('SessionId').cumcount(),values='TimeStamp',aggfunc='sum').\
add_prefix('TimeStamp_')
df2=df.pivot_table(index='SessionId',columns=df.groupby('SessionId').cumcount(),values='URL',aggfunc='sum').\
add_prefix('URL_')
pd.concat([df2,df1],1).reset_index()
Out[209]:
SessionId URL_0 URL_1 URL_2 URL_3 TimeStamp_0 \
0 aa420858 url1 url2 url3 None 20:24
1 bb779bc3 other_url1 other_url2 other_url3 other_url4 18:18
2 zz920853 diff_url1 diff_url2 None None 20:27
TimeStamp_1 TimeStamp_2 TimeStamp_3
0 20:26 20:27 None
1 18:21 18:24 18:25
2 20:28 None None
Ps. if you need Id from 1... you can add .add(1) in df.groupby('SessionId').cumcount().add(1)
DIFF of time
df['DIFF']=df.groupby('SessionId').TimeStamp.apply(lambda x :pd.to_datetime(x).diff().dt.total_seconds() / 60)
df3=df.dropna()
df3.pivot_table(index='SessionId',columns=df3.groupby('SessionId').cumcount(),values='DIFF',aggfunc='sum').add_prefix('diff')
Out[241]:
diff0 diff1 diff2
SessionId
aa420858 2.0 1.0 NaN
bb779bc3 3.0 3.0 1.0
zz920853 1.0 NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With