How to groupby two fields in pandas?

Question

Given following input, the goal is to group values by hour for each Date with Avg and Sum functions. Solution to grouping it by hour is here, but it does not consider new days.

Date        Time    F1  F2  F3
21-01-16    8:11    5   2   4
21-01-16    9:25    9   8   2
21-01-16    9:39    7   3   2
21-01-16    9:53    6   5   1
21-01-16    10:07   4   6   7
21-01-16    10:21   7   3   1
21-01-16    10:35   5   6   7
21-01-16    11:49   1   2   1
21-01-16    12:03   3   3   1
22-01-16    9:45    6   5   1
22-01-16    9:20    4   6   7
22-01-16    12:10   7   3   1

Expected output:

Date,Time,SUM F1,SUM F2,SUM F3,AVG F1,AVG F2,AVG F3
21-01-16,8:00,5,2,4,5,2,4
21-01-16,9:00,22,16,5,7.3,5.3,1.6
21-01-16,10:00,16,15,15,5.3,5,5
21-01-16,11:00,1,2,1,1,2,1
21-01-16,12:00,3,3,1,3,3,1
22-01-16,9:00,10,11,8,5,5.5,4
22-01-16,12:00,7,3,1,7,3,1

Mike Müller · Accepted Answer

You can do the parsing of dates during reading of the csv file:

from __future__ import print_function # make it work with Python 2 and 3

df = pd.read_csv('f123_dates.csv', index_col=0, parse_dates=[0, 1],
                 delim_whitespace=True)
print(df.groupby([df.index, df.Time.dt.hour]).agg(['mean','sum']))

Output:

                       F1            F2            F3    
                     mean sum      mean sum      mean sum
Date       Time                                          
2016-01-21 8     5.000000   5  2.000000   2  4.000000   4
           9     7.333333  22  5.333333  16  1.666667   5
           10    5.333333  16  5.000000  15  5.000000  15
           11    1.000000   1  2.000000   2  1.000000   1
           12    3.000000   3  3.000000   3  1.000000   1
2016-01-22 9     5.000000  10  5.500000  11  4.000000   8
           12    7.000000   7  3.000000   3  1.000000   1

All the way into csv:

from __future__ import print_function

df = pd.read_csv('f123_dates.csv', index_col=0, parse_dates=[0, 1],
                 delim_whitespace=True)
df2 = df.groupby([df.index, df.Time.dt.hour]).agg(['mean','sum'])
df3 = df2.reset_index()
df3.columns = [' '.join(col).strip() for col in df3.columns.values]
print(df3.to_csv(columns=df3.columns, index=False))

Output:

Date,Time,F1 mean,F1 sum,F2 mean,F2 sum,F3 mean,F3 sum
2016-01-21,8,5.0,5,2.0,2,4.0,4
2016-01-21,9,7.333333333333333,22,5.333333333333333,16,1.6666666666666667,5
2016-01-21,10,5.333333333333333,16,5.0,15,5.0,15
2016-01-21,11,1.0,1,2.0,2,1.0,1
2016-01-21,12,3.0,3,3.0,3,1.0,1
2016-01-22,9,5.0,10,5.5,11,4.0,8
2016-01-22,12,7.0,7,3.0,3,1.0,1

jezrael · Answer

You cas use convert time to datetime by to_datetime and then groupby with agg:

print df    
         Date   Time  F1  F2  F3
0  2016-01-21   8:11   5   2   4
1  2016-01-21   9:25   9   8   2
2  2016-01-21   9:39   7   3   2
3  2016-01-21   9:53   6   5   1
4  2016-01-21  10:07   4   6   7
5  2016-01-21  10:21   7   3   1
6  2016-01-21  10:35   5   6   7
7  2016-01-21  11:49   1   2   1
8  2016-01-21  12:03   3   3   1
9  2016-01-22   9:45   6   5   1
10 2016-01-22   9:20   4   6   7
11 2016-01-22  12:10   7   3   1

df['Time'] = pd.to_datetime(df['Time'], format="%H:%M")
print df
         Date                Time  F1  F2  F3
0  2016-01-21 1900-01-01 08:11:00   5   2   4
1  2016-01-21 1900-01-01 09:25:00   9   8   2
2  2016-01-21 1900-01-01 09:39:00   7   3   2
3  2016-01-21 1900-01-01 09:53:00   6   5   1
4  2016-01-21 1900-01-01 10:07:00   4   6   7
5  2016-01-21 1900-01-01 10:21:00   7   3   1
6  2016-01-21 1900-01-01 10:35:00   5   6   7
7  2016-01-21 1900-01-01 11:49:00   1   2   1
8  2016-01-21 1900-01-01 12:03:00   3   3   1
9  2016-01-22 1900-01-01 09:45:00   6   5   1
10 2016-01-22 1900-01-01 09:20:00   4   6   7
11 2016-01-22 1900-01-01 12:10:00   7   3   1

df = df.groupby([df['Date'], df['Time'].dt.hour]).agg(['mean','sum']).reset_index()
print df
        Date Time        F1            F2            F3    
                       mean sum      mean sum      mean sum
0 2016-01-21    8  5.000000   5  2.000000   2  4.000000   4
1 2016-01-21    9  7.333333  22  5.333333  16  1.666667   5
2 2016-01-21   10  5.333333  16  5.000000  15  5.000000  15
3 2016-01-21   11  1.000000   1  2.000000   2  1.000000   1
4 2016-01-21   12  3.000000   3  3.000000   3  1.000000   1
5 2016-01-22    9  5.000000  10  5.500000  11  4.000000   8
6 2016-01-22   12  7.000000   7  3.000000   3  1.000000   1

And then you can set column names by list comprehension:

levels = df.columns.levels
labels = df.columns.labels
df.columns = [ x + " " + y for x, y in  zip(levels[0][labels[0]],df.columns.droplevel(0))]
print df

        Date   Time   F1 mean  F1 sum   F2 mean  F2 sum   F3 mean  F3 sum
0 2016-01-21      8  5.000000       5  2.000000       2  4.000000       4
1 2016-01-21      9  7.333333      22  5.333333      16  1.666667       5
2 2016-01-21     10  5.333333      16  5.000000      15  5.000000      15
3 2016-01-21     11  1.000000       1  2.000000       2  1.000000       1
4 2016-01-21     12  3.000000       3  3.000000       3  1.000000       1
5 2016-01-22      9  5.000000      10  5.500000      11  4.000000       8
6 2016-01-22     12  7.000000       7  3.000000       3  1.000000       1

How to groupby two fields in pandas?

Tags:

python

pandas

aggregate

user52028778

2 Answers

Mike Müller

jezrael

Recent Activity

Donate For Us

How to groupby two fields in pandas?

Tags:

python

pandas

aggregate

user52028778

2 Answers

Mike Müller

jezrael

Related questions

Recent Activity

Donate For Us