Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting Values of columns in all previous rows excluding current row

Tags:

python

pandas

I'm currently learning Pandas and stuck with a problem.

I have the following data:

labels = [ 'date', 'name', 'opponent', 'gf', 'ga']
data = [ 
 [ '2023-08-5', 'Liverpool', 'Man Utd', 5, 0 ],
 [ '2023-08-10', 'Liverpool', 'Everton', 0, 0 ],
 [ '2023-08-14', 'Liverpool', 'Tottenham', 3, 2 ],
 [ '2023-08-18', 'Liverpool', 'Arsenal', 4, 4 ],
 [ '2023-08-27', 'Liverpool', 'Man City', 0, 0 ],
]
df = pd.DataFrame(data, columns=labels)

The games / rows are sorted by date. for each row / game I would like to count the column values of 'goals_for' and 'goals_against' in the previous rows / games (excluding the current row or any after the date).

So I would like the data to be like this:

labels = [ 'date', 'name', 'opponent', 'gf', 'ga', 'total_gf', 'total_ga' ]
data = [ 
 [ '2023-08-5', 'Liverpool', 'Man Utd', 5, 0, 0, 0 ],
 [ '2023-08-10', 'Liverpool', 'Everton', 0, 0, 5, 0 ],
 [ '2023-08-14', 'Liverpool', 'Tottenham', 3, 2, 5, 0 ],
 [ '2023-08-18', 'Liverpool', 'Arsenal', 4, 4, 8, 2 ],
 [ '2023-08-27', 'Liverpool', 'Man City', 0, 0, 12, 6 ],
]

I tried expanding() but it seems to include the current row. rolling has a parameter closed='left' but others don't have it.

Any help or tips or links to similar solutions would be appreciated.

like image 637
Ewan Avatar asked Oct 26 '25 19:10

Ewan


2 Answers

IIUC you can do:

df["total_gf"] = df["gf"].shift(fill_value=0).cumsum()
df["total_ga"] = df["ga"].shift(fill_value=0).cumsum()

print(df)

Prints:

         date       name   opponent  gf  ga  total_gf  total_ga
0   2023-08-5  Liverpool    Man Utd   5   0         0         0
1  2023-08-10  Liverpool    Everton   0   0         5         0
2  2023-08-14  Liverpool  Tottenham   3   2         5         0
3  2023-08-18  Liverpool    Arsenal   4   4         8         2
4  2023-08-27  Liverpool   Man City   0   0        12         6
like image 113
Andrej Kesely Avatar answered Oct 29 '25 07:10

Andrej Kesely


You can shift with fill_value=0, then cumsum:

df['total_gf'] = df['gf'].shift(fill_value=0).cumsum()
df['total_ga'] = df['ga'].shift(fill_value=0).cumsum()

Alternatively, processing all columns at once:

df[['total_gf', 'total_ga']] = df[['gf', 'ga']].shift(fill_value=0).cumsum()

Or, create a new DataFrame:

out = df.join(df[['gf', 'ga']].shift(fill_value=0).cumsum().add_prefix('total_'))

Output:

         date       name   opponent  gf  ga  total_gf  total_ga
0   2023-08-5  Liverpool    Man Utd   5   0         0         0
1  2023-08-10  Liverpool    Everton   0   0         5         0
2  2023-08-14  Liverpool  Tottenham   3   2         5         0
3  2023-08-18  Liverpool    Arsenal   4   4         8         2
4  2023-08-27  Liverpool   Man City   0   0        12         6
like image 45
mozway Avatar answered Oct 29 '25 09:10

mozway