Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New column with a value inside a group

Tags:

python

       A           B    C
0    Red  2002-01-13  3.9
1    Red  2002-01-13  4.1
2    Red  2002-01-13  2.3
3    Red  2002-01-14  0.7
4    Red  2002-01-14  5.9
5    Red  2002-01-14  3.0
6    Red  2002-01-15  6.9
7    Red  2002-01-15  6.4
8    Red  2002-01-15  9.9
9   Blue  2006-07-21  7.2
10  Blue  2006-07-21  4.2
11  Blue  2006-07-21  6.1
12  Blue  2006-07-22  0.1
13  Blue  2006-07-22  3.2
14  Blue  2006-07-22  2.7

I want to create a df['D'] column, with next condition:

  • Take the last C value of the first B date, by each A group.

I mean, for A group Red, take 2.3, as it is the last C value of first B date 2002-01-13. For A group Blue, take 6.1, as it is the last C value of first B date 2006-07-21.

Output should be:

       A           B    C    D
0    Red  2002-01-13  3.9  2.3
1    Red  2002-01-13  4.1  2.3
2    Red  2002-01-13  2.3  2.3
3    Red  2002-01-14  0.7  2.3
4    Red  2002-01-14  5.9  2.3
5    Red  2002-01-14  3.0  2.3
6    Red  2002-01-15  6.9  2.3
7    Red  2002-01-15  6.4  2.3
8    Red  2002-01-15  9.9  2.3
9   Blue  2006-07-21  7.2  6.1
10  Blue  2006-07-21  4.2  6.1
11  Blue  2006-07-21  6.1  6.1
12  Blue  2006-07-22  0.1  6.1
13  Blue  2006-07-22  3.2  6.1
14  Blue  2006-07-22  2.7  6.1

I've tried:

df['D'] = df.groupby('A')["C"].transform('last')
like image 719
Tie_24 Avatar asked Nov 21 '25 13:11

Tie_24


1 Answers

Start by sorting the dataframe in the reverse order of Bs, grouping by A, and taking the last value from each group:

dfd = df.sort_values('B', ascending=False).groupby('A').last()\
                               .reset_index()

Then combine the new dataframe with the original dataframe and select unique columns:

df = df.merge(dfd, on=('A'))[['A','B_x','C_x','C_y']]
#       A         B_x  C_x  C_y
#0    Red  2002-01-13  3.9  2.3
#1    Red  2002-01-13  4.1  2.3
#2    Red  2002-01-13  2.3  2.3
#3    Red  2002-01-14  0.7  2.3
#4    Red  2002-01-14  5.9  2.3
#5    Red  2002-01-14  3.0  2.3
#6    Red  2002-01-15  6.9  2.3
#7    Red  2002-01-15  6.4  2.3
#8    Red  2002-01-15  9.9  2.3
#9   Blue  2006-07-21  7.2  6.1
#10  Blue  2006-07-21  4.2  6.1
#11  Blue  2006-07-21  6.1  6.1
#12  Blue  2006-07-22  0.1  6.1
#13  Blue  2006-07-22  3.2  6.1
#14  Blue  2006-07-22  2.7  6.1

You can rename the columns back if you want:

df.columns = 'A','B','C','D'

Alternative implementation (without merging):

dfd = df.sort_values('B', ascending=False).groupby('A').last()
dfd = dfd.set_index(['A','B'])
df = df.set_index(['A','B'])
df['D'] = dfd
df.fillna(method='ffill', inplace=True)
df.reset_index()
#       A           B    C    D
#0    Red  2002-01-13  3.9  2.3
#1    Red  2002-01-13  4.1  2.3
#2    Red  2002-01-13  2.3  2.3
#3    Red  2002-01-14  0.7  2.3
#4    Red  2002-01-14  5.9  2.3
#5    Red  2002-01-14  3.0  2.3
#6    Red  2002-01-15  6.9  2.3
#7    Red  2002-01-15  6.4  2.3
#8    Red  2002-01-15  9.9  2.3
#9   Blue  2006-07-21  7.2  6.1
#10  Blue  2006-07-21  4.2  6.1
#11  Blue  2006-07-21  6.1  6.1
#12  Blue  2006-07-22  0.1  6.1
#13  Blue  2006-07-22  3.2  6.1
#14  Blue  2006-07-22  2.7  6.1
like image 147
DYZ Avatar answered Nov 24 '25 04:11

DYZ