Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas calculate based on multiple rows and conditions

Tags:

python

pandas

I'm novoce to pandas. Need to calculate time for each person, for each location and drop rows without pair in dates col. My data looks like this:

Unit    Name    Location    Date    Time
0  K1  Somebody1    LOC1  2020-05-12  07:00
1  K1  Somebody1    LOC1  2020-05-12  20:10
2  K1  Somebody1    LOC1  2020-05-13  06:00
3  K1  Somebody1    LOC1  2020-05-13  20:00
4  K1  Somebody1    LOC1  2020-05-14  06:37
5  K1  Somebody1    LOC2  2020-05-15  07:00
6  K1  Somebody1    LOC2  2020-05-15  20:10
7  K1  Somebody1    LOC2  2020-05-16  06:00
8  K1  Somebody1    LOC2  2020-05-16  20:00
9  K1  Somebody1    LOC2  2020-05-17  06:37
10  K1  Somebody2    LOC2  2020-05-13  07:00
11  K1  Somebody2    LOC2  2020-05-14  10:10
12  K1  Somebody2    LOC2  2020-05-14  16:50
13  K1  Somebody2    LOC2  2020-05-15  05:36
14  K1  Somebody3    LOC1  2020-05-13  07:00
15  K1  Somebody3    LOC1  2020-05-14  10:10
16  K1  Somebody3    LOC1  2020-05-14  16:50
17  K1  Somebody3    LOC1  2020-05-15  05:36

I only menaged to convert time to datetime object by

df['Time'] = df['Time'].apply(lambda x: datetime.strptime(x,'%H:%M').time())

Tried using pivot tables, grouping by, for loops and I'm out of ideas. I wanted output to look like that:

LOC1
      Somebody1  2020-05-12  13h 10m
                 2020-05-13  14h 00m
TOTAL                        27h 00m
      Somebody2  date        hours
                 date        hours
TOTAL                        sum for somebody2
      Somebody3  date        hours
                 date        hours
TOTAL                        sum for somebody3

LOC2
      Somebody1  date        hours
                 date        hours
TOTAL                        sum for somebody1
      Somebody2  date        hours   
                 date        hours
TOTAL                        sum for somebody2

or something similar

like image 303
Abdul Alhazred Avatar asked Dec 07 '25 08:12

Abdul Alhazred


1 Answers

IIUC groupby and combine first

import numpy as np
df['datetime'] = pd.to_datetime(df['Date'] + ' ' +  df['Time'])

df1 = df.groupby(['Name','Location', df['datetime'].dt.normalize()])\
                                  .agg(start=('datetime','first'),
                                   end=('datetime','last'))

df1['timespent'] = (df1['end'] - df1['start']) / np.timedelta64(1,'h')

# create total row.
m = df1.unstack(['Name','Location'])['timespent'].sum().unstack()
m = m.assign(TOTAL=m.sum(1)).stack().to_frame('timespent')



final = df1.drop(['start','end'],axis=1).combine_first(m)

#if you want to remove single entry days
final[final['timespent'] > 0]

                               timespent
Name      Location datetime             
Somebody1 LOC1     2020-05-12  13.166667
                   2020-05-13  14.000000
          TOTAL    NaT         27.166667
Somebody2 LOC2     2020-05-14   6.666667
          TOTAL    NaT          6.666667
like image 52
Umar.H Avatar answered Dec 09 '25 22:12

Umar.H



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!