I am trying to aggregate the dataframe in order to have one date per row (for each group).
     Cod1       Cod2    Date     E   A       S
327 100013.0    001 2019-02-01  0.0 0.0    511.0
323 100013.0    001 2019-02-01  0.0 -14.0   NaN
336 100013.0    001 2019-02-02  0.0 -28.0   NaN
341 100013.0    001 2019-02-03  0.0 -6.0    NaN
350 100013.0    001 2019-02-03  0.0 -3.0    NaN
373 100013.0    001 2019-02-07  0.0 -15.0   0
377 100013.0    001 2019-02-07  0.0 -9.0    NaN
Using the following:
df = df.groupby(['Date', 'Cod1', 'Cod2'])['E','A', 'S'].sum()
I got the following output:
2019-02-01  100013.0    001 0.0 -14.0   511.0
2019-02-02  100013.0    001 0.0 -28.0   0.0
2019-02-03  100013.0    001 0.0 -9.0    0.0
2019-02-06  100013.0    001 0.0 -24.0   0.0
My questions is:
There will be 3 scenarios:
1 -) Two rows on same date, last column having NaN and a not null number:
327 100013.0    001 2019-02-01  0.0 0.0    511.0
323 100013.0    001 2019-02-01  0.0 -14.0   NaN
I would like that in this situation always keep the number.
2-) Two rows on same date, last column having 2 NaNs rows
341 100013.0    001 2019-02-03  0.0 -6.0    NaN
350 100013.0    001 2019-02-03  0.0 -3.0    NaN
I would like that in this situation always keep the NaN.
3-) Two rows on same date, last column having one zero value column and one NaN column
373 100013.0    001 2019-02-07  0.0 -15.0   0
377 100013.0    001 2019-02-07  0.0 -9.0    NaN
I would like that in this situation always keep the 0.
So my expected out should be this one:
2019-02-01  100013.0    001 0.0 -14.0   511.0
2019-02-02  100013.0    001 0.0 -28.0   NaN
2019-02-03  100013.0    001 0.0 -9.0    NaN
2019-02-06  100013.0    001 0.0 -24.0   0.0
Check min_count
df.groupby(['Date', 'Cod1', 'Cod2'])['E','A', 'S'].sum(min_count=1)
Out[260]: 
                            E     A      S
Date       Cod1     Cod2                  
2019-02-01 100013.0 1     0.0 -14.0  511.0
2019-02-02 100013.0 1     0.0 -28.0    NaN
2019-02-03 100013.0 1     0.0  -9.0    NaN
2019-02-07 100013.0 1     0.0 -24.0    0.0
I guess a custom function can do:
(df.groupby(['Date', 'Cod1', 'Cod2'])
    ['E','A', 'S']
    .agg(lambda x: np.nan if x.isna().all() else x.sum())
)
Output:
                            E     A      S
Date       Cod1     Cod2                  
2019-02-01 100013.0 1     0.0 -14.0  511.0
2019-02-02 100013.0 1     0.0 -28.0    NaN
2019-02-03 100013.0 1     0.0  -9.0    NaN
2019-02-07 100013.0 1     0.0 -24.0    0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With