Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: How to calculate new column based on index or groupID?

This might be a very simple problem but I can not find the solution: I want to add a new column "col_new" with operations depending on group variables like groupIDs or dates. So depending on the groupID the calculation should change.
Example:

   Year  col1  col2
0  2019    10     1
1  2019     4     2
2  2019    25     1
3  2018     3     1
4  2017    56     2
5  2017     3     2


- for Year = 2017: col_new = col1-col2
- for Year = 2018: col_new = col1+col2
- for Year = 2019: col_new = col1*col2
Also I want to wrap this up in a for loop.

year = [2017, 2018, 2019]
for x in year:
    df["new_col]" = ................
  • tried using if-functions <== allways requires an else so it changes all values of the previous iteration
  • using .loc and it works but becomes very hard to handle with long and complex conditions
  • tried setting index for column Year. This is easy doing but then I am stuck.
import pandas as pd
import numpy as np

d = {'Year': [2019, 2019, 2019, 2018, 2017, 2017],
     'col1': [10, 4, 25, 3, 56, 3],
     'col2': [1, 2, 1, 1, 2, 2]}
df = pd.DataFrame(data=d) #the example dataframe
df = df.set_index("Year")
print(df)
      col1  col2
Year            
2019    10     1
2019     4     2
2019    25     1
2018     3     1
2017    56     2
2017     3     2

Now I need something like:
- if 2017 then col1+col2
- if 2018 then col1-col2
- if 2019 then col1*col2

like image 894
Martin Flower Avatar asked Oct 18 '25 07:10

Martin Flower


2 Answers

dict of operators

from operator import sub, add, mul

op = {2019: mul, 2018: add, 2017: sub}

df.assign(new_col=[op[t.Year](t.col1, t.col2) for t in df.itertuples()])

   Year  col1  col2  new_col
0  2019    10     1       10
1  2019     4     2        8
2  2019    25     1       25
3  2018     3     1        4
4  2017    56     2       54
5  2017     3     2        1

If Year is in the index

df.assign(new_col=[op[t.Index](t.col1, t.col2) for t in df.itertuples()])

      col1  col2  new_col
Year                     
2019    10     1       10
2019     4     2        8
2019    25     1       25
2018     3     1        4
2017    56     2       54
2017     3     2        1
like image 85
piRSquared Avatar answered Oct 22 '25 06:10

piRSquared


You can use numpy.select

cond = [df.index == 2017, df.index == 2018, df.index == 2019]
choice = [df.col1+df.col2, df.col1-df.col2, df.col1*df.col2]
df['new'] = np.select(cond, choice)



       col1 col2    new
Year            
2019    10  1       10
2019    4   2       8
2019    25  1       25
2018    3   1       2
2017    56  2       58
2017    3   2       5
like image 28
Vaishali Avatar answered Oct 22 '25 05:10

Vaishali