Suppose I have a dataframe as below:
df1=
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 NaN
2 A2 B0 C0 NaN
3 A3 B2 C2 NaN
4 A4 B2 C2 2,3
5 A5 B3 C3 NaN
6 A6 B3 C3 NaN
I want the result to be
df1=
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 1,1
2 A2 B0 C0 1,1
3 A3 B2 C2 2,3
4 A4 B2 C2 2,3
5 A5 B3 C3 NaN
6 A6 B3 C3 NaN
I want to update coordinates with the same street and city.
In the above example (B0,C0) at index 0 has coordinates (1,1). So I need to update coordinates at indices 1 and 2 to (1,1) since they have same street and city(B0,C0).
What is the best way to achieve this?
Also how do I update all the dataframes in similar fashion if we are given a list of dataframes. i.e
df_list = [df1,df2,..]
Is it a good idea to first generate a dataframe with unique rows from all the dataframes and then use this dataframe for look-up and update each dataframe?
If only one non NaN value in each group is possible use sort_values with ffill (Series.fillna with method='ffill'):
df = df.sort_values(['street','city','coordinates'])
df['coordinates'] = df['coordinates'].ffill()
print (df)
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 1,1
2 A2 B0 C0 1,1
4 A4 B2 C2 2,3
3 A3 B2 C2 2,3
5 A5 B2 C2 2,3
5 A6 B2 C2 2,3
Solution with GroupBy.transform with dropna:
df['coordinates'] = df.groupby(['street','city'])['coordinates']
.transform(lambda x: x.dropna())
print (df)
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 1,1
2 A2 B0 C0 1,1
3 A3 B2 C2 2,3
4 A4 B2 C2 2,3
5 A5 B2 C2 2,3
5 A6 B2 C2 2,3
Or ffill with bfill:
df['coordinates'] = df.groupby(['street','city'])['coordinates']
.transform(lambda x: x.ffill().bfill())
print (df)
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 1,1
2 A2 B0 C0 1,1
3 A3 B2 C2 2,3
4 A4 B2 C2 2,3
5 A5 B2 C2 2,3
5 A6 B2 C2 2,3
Second solution works with multiple values also - first forward fill values per group (not replace first values, stay NaN) and then all first values replace by back filling:
print (df)
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 NaN
2 A2 B0 C0 NaN
3 A3 B2 C2 NaN
4 A4 B2 C2 2,3
5 A5 B2 C2 4,7
5 A6 B2 C2 NaN
df['coordinates'] = df.groupby(['street','city'])['coordinates']
.transform(lambda x: x.ffill().bfill())
print (df)
name street city coordinates
0 A0 B0 C0 1,1
1 A1 B0 C0 1,1
2 A2 B0 C0 1,1
3 A3 B2 C2 2,3
4 A4 B2 C2 2,3
5 A5 B2 C2 4,7
5 A6 B2 C2 4,7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With