Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does .transform('first') do?

Somebody helped me for a code. I understood everything in the code except the very last row .transform('first') I see what it does (I can see it), but I'd like to precisely know what it's doing behind to obtain this result.

This is the part of the code I understand :

df['Date'] = pd.to_datetime(df['Date'])
df['YEP'] = ( df[::-1].loc[df['Type'].eq('Budget')]
                     .groupby(df['Date'].dt.year)
                     .Value
                     .cumsum()
                     .sub(df['Value'])
                     .add(df['YTD'])
)

This is the output of this first part :

    Value    Type       Date    YTD     YEP
0     100  Budget 2019-01-01  101.0   974.0
1      50  Budget 2019-02-01  199.0  1022.0
2      20  Budget 2019-03-01  275.0  1078.0
3     123  Budget 2019-04-01  332.0  1012.0
4      56  Budget 2019-05-01    NaN     NaN
5      76  Budget 2019-06-01    NaN     NaN
6      98  Budget 2019-07-01    NaN     NaN
7     126  Budget 2019-08-01    NaN     NaN
8      90  Budget 2019-09-01    NaN     NaN
9      80  Budget 2019-10-01    NaN     NaN
10     67  Budget 2019-11-01    NaN     NaN
11     87  Budget 2019-12-01    NaN     NaN
12    101  Actual 2019-01-01  101.0     NaN
13     98  Actual 2019-02-01  199.0     NaN
14     76  Actual 2019-03-01  275.0     NaN
15     57  Actual 2019-04-01  332.0     NaN

This is the entire code :

df['Date'] = pd.to_datetime(df['Date'])
df['YEP'] = ( df[::-1].loc[df['Type'].eq('Budget')]
                     .groupby(df['Date'].dt.year)
                     .Value
                     .cumsum()
                     .sub(df['Value'])
                     .add(df['YTD'])
                     .groupby(df['Date'])
                     .transform('first') )

I got this after running the entire code :

    Value    Type       Date    YTD     YEP
0     100  Budget 2019-01-01  101.0   974.0
1      50  Budget 2019-02-01  199.0  1022.0
2      20  Budget 2019-03-01  275.0  1078.0
3     123  Budget 2019-04-01  332.0  1012.0
4      56  Budget 2019-05-01    NaN     NaN
5      76  Budget 2019-06-01    NaN     NaN
6      98  Budget 2019-07-01    NaN     NaN
7     126  Budget 2019-08-01    NaN     NaN
8      90  Budget 2019-09-01    NaN     NaN
9      80  Budget 2019-10-01    NaN     NaN
10     67  Budget 2019-11-01    NaN     NaN
11     87  Budget 2019-12-01    NaN     NaN
12    101  Actual 2019-01-01  101.0   974.0
13     98  Actual 2019-02-01  199.0  1022.0
14     76  Actual 2019-03-01  275.0  1078.0
15     57  Actual 2019-04-01  332.0  1012.0

I know that "transform" is like "apply". But I don't get what it means to apply - or transform - with this parameter first. What does first do here combined with transform?

like image 392
alexnesov Avatar asked Nov 16 '25 08:11

alexnesov


1 Answers

  1. What does it mean 'first'?

    The parameter in the .transform() method may be a NumPy function, a string function name or a user-defined function. It means that in the line

    .transform('first')
    

    it's a string function name. So it represents the function first().

  2. Where is the function first() coming from?

    It's a GroupBy's method .first().

  3. What does the function first() return?

    It returns the first non-NaN value in a series, or NaN if there is none.

  4. What does the method .transform() do?

    It applies its parameter-function to every column (i.e. the series) of dataframe to obtain a new (transformed) column. Then it returns a dataframe consisting of such (transformed) columns.

    In the case of series it returns — of course — a transformed series.

  5. It means that function-parameter of .transform method must return a series with the same size?

    No, it is only one possibility.

    The other is a scalar — it will be broadcasted (repeated) to make a series with the same size.

    The used function (the GroupBy's method first()) is a good example of such a function.

  6. So what does the method .transform('first') return?

    It returns a series / dataframe with the same shape as the source group chunk, in which all values in every individual column are replaced with the first non-NaN value in this column, or with NaN if there is none.

Conclusion:

The lines

 .groupby(df['Date'])
 .transform('first')

first split your (intermediate) series into groups of individual dates and then — just before recombination — apply the first() function to every series in every group.

It effectively replaces every value in every group with the first non-NaN value in its series if such a value exists.

This means that in the resulting series (your new column) will be all values of (intermediate) series replaced with the first non-NaN value in the same day (if such a value in the same day exists).

like image 132
MarianD Avatar answered Nov 18 '25 23:11

MarianD



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!