A B C
0 Red 2002-01-13 3.9
1 Red 2002-01-13 4.1
2 Red 2002-01-13 2.3
3 Red 2002-01-14 0.7
4 Red 2002-01-14 5.9
5 Red 2002-01-14 3.0
6 Red 2002-01-15 6.9
7 Red 2002-01-15 6.4
8 Red 2002-01-15 9.9
9 Blue 2006-07-21 7.2
10 Blue 2006-07-21 4.2
11 Blue 2006-07-21 6.1
12 Blue 2006-07-22 0.1
13 Blue 2006-07-22 3.2
14 Blue 2006-07-22 2.7
I want to create a df['D'] column, with next condition:
C value of the first B date, by each A group.I mean, for A group Red, take 2.3, as it is the last C value of first B date 2002-01-13. For A group Blue, take 6.1, as it is the last C value of first B date 2006-07-21.
Output should be:
A B C D
0 Red 2002-01-13 3.9 2.3
1 Red 2002-01-13 4.1 2.3
2 Red 2002-01-13 2.3 2.3
3 Red 2002-01-14 0.7 2.3
4 Red 2002-01-14 5.9 2.3
5 Red 2002-01-14 3.0 2.3
6 Red 2002-01-15 6.9 2.3
7 Red 2002-01-15 6.4 2.3
8 Red 2002-01-15 9.9 2.3
9 Blue 2006-07-21 7.2 6.1
10 Blue 2006-07-21 4.2 6.1
11 Blue 2006-07-21 6.1 6.1
12 Blue 2006-07-22 0.1 6.1
13 Blue 2006-07-22 3.2 6.1
14 Blue 2006-07-22 2.7 6.1
I've tried:
df['D'] = df.groupby('A')["C"].transform('last')
Start by sorting the dataframe in the reverse order of Bs, grouping by A, and taking the last value from each group:
dfd = df.sort_values('B', ascending=False).groupby('A').last()\
.reset_index()
Then combine the new dataframe with the original dataframe and select unique columns:
df = df.merge(dfd, on=('A'))[['A','B_x','C_x','C_y']]
# A B_x C_x C_y
#0 Red 2002-01-13 3.9 2.3
#1 Red 2002-01-13 4.1 2.3
#2 Red 2002-01-13 2.3 2.3
#3 Red 2002-01-14 0.7 2.3
#4 Red 2002-01-14 5.9 2.3
#5 Red 2002-01-14 3.0 2.3
#6 Red 2002-01-15 6.9 2.3
#7 Red 2002-01-15 6.4 2.3
#8 Red 2002-01-15 9.9 2.3
#9 Blue 2006-07-21 7.2 6.1
#10 Blue 2006-07-21 4.2 6.1
#11 Blue 2006-07-21 6.1 6.1
#12 Blue 2006-07-22 0.1 6.1
#13 Blue 2006-07-22 3.2 6.1
#14 Blue 2006-07-22 2.7 6.1
You can rename the columns back if you want:
df.columns = 'A','B','C','D'
Alternative implementation (without merging):
dfd = df.sort_values('B', ascending=False).groupby('A').last()
dfd = dfd.set_index(['A','B'])
df = df.set_index(['A','B'])
df['D'] = dfd
df.fillna(method='ffill', inplace=True)
df.reset_index()
# A B C D
#0 Red 2002-01-13 3.9 2.3
#1 Red 2002-01-13 4.1 2.3
#2 Red 2002-01-13 2.3 2.3
#3 Red 2002-01-14 0.7 2.3
#4 Red 2002-01-14 5.9 2.3
#5 Red 2002-01-14 3.0 2.3
#6 Red 2002-01-15 6.9 2.3
#7 Red 2002-01-15 6.4 2.3
#8 Red 2002-01-15 9.9 2.3
#9 Blue 2006-07-21 7.2 6.1
#10 Blue 2006-07-21 4.2 6.1
#11 Blue 2006-07-21 6.1 6.1
#12 Blue 2006-07-22 0.1 6.1
#13 Blue 2006-07-22 3.2 6.1
#14 Blue 2006-07-22 2.7 6.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With