Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create new row out of calculation across other rows in Pandas Data Frame?

I want to create a new row via row calculations. My DataFrame looks like this:

Para   01.21    02.21   03.21   
<48h     22      25      38 
>48h     0       1        3

I want to calc the percentage across each row such that:

Goal

Para   01.21    02.21   03.21   
<48h     22      25      38 
>48h     0       1        3
Perc.    0%      4%       8%

I had some ideas with appending a new DataFrame, but nothing really succesful. Thank you for your help!

like image 490
alexryder Avatar asked Oct 15 '25 06:10

alexryder


1 Answers

Assuming you are happy to modify the input DataFrame in place, this will produce the exact output you've requested:

df.loc['Perc.'] = (
    df.loc['>48h'] / df.loc['<48h']
).apply(lambda x: f"{x:.0%}")

This assumes that the first column in your printed output is the index of the DataFrame.

To unpack this a bit:

  1. The assignment df.loc['Perc.'] = will create a new row populated with the value on the right-hand side of the =.
  2. df.loc['>48h'] / df.loc['<48h'] computes the desired ratio, as a float
  3. The .apply call calls a function on each element of the new row, to convert the raw percentage number to a string formatted in the manner you requested. To unpack this a little further:
    • f"{x}" converts x to a string (it is essentially equivalent to str(x))
    • f"{x:.0%}" enhances the above by adding the .0% format specifier, which will display the value as a percentage with 0 decimal places (i.e. multiply by 100, round to 0 decimal places, and then add a % sign). There are various other format specifiers available for floating point data.

Note that if you want to do any further processing on the resulting value, I would recommend not doing step 3 - once you've converted your data into a string it obviously can't be e.g. multiplied by other data.

If you want the best of both worlds, keeping the internal value as a float ratio, but displaying them as a percentage, you can do this:

df.loc['Perc.'] = (
    df.loc['>48h'] / df.loc['<48h']
)

display_str = df.T.to_string(formatters={"Perc.": lambda x: f"{x:.0%}"})
print(display_str)

This will modify df to add the Perc. row, but its values will stay as the actual float ratio. Then display_str will be an appropriately-formatted string representation of the whole dataframe, including the percentage.

Note that formatters accepts a dict keyed by column names, not rows, so you must first transpose your data (df.T), the resulting output will be transposed:

   Para  <48h  >48h Perc.
0  1.21  22.0   0.0    0%
1  2.21  25.0   1.0    4%
2  3.21  38.0   3.0    8%

If you're working inside a Jupyter notebook, you can also use the df.style.format method to achieve something similar; calling

df.T.style.format({"Perc.": "{:.0%}"})

will return a pandas.io.formats.style.Styler object, but if you're in a Jupyter notebook, this will be rendered like a DataFrame with the relevant column formatted as above. Again though, this only works on a column-by-column basis, so you'll need to transpose.

like image 71
Dom Avatar answered Oct 17 '25 19:10

Dom