Is there a quicker method for iterating over rows in Python to calculate a feature?

Question

I have a Pandas Dataframe df that details Names of players that play a game. The Dataframe has 2 columns of 'Date' they played a game and their name, sorted by Date.

Date	Name
1993-03-28	Tom
1993-03-28	Joe
1993-03-29	Tom
1993-03-30	Joe

What I am trying to accomplish is to time-efficiently calculate the previous number of games each player has played before they play the upcoming game that day.

For the example Dataframe above, calculating the players previous number of games would start at 0 and look like follows.

Date	Name	Previous Games
1993-03-28	Tom	0
1993-03-28	Joe	0
1993-03-29	Tom	1
1993-03-30	Joe	1

I have tried the following codes and although they have delivered the correct result, they took many days for my computer to run.

Attempt 1:

for i in range(0, len(df) ):
   df['Previous Games'][i] = len( df[ (df['Name'] == df['Name'][i]) & (df['Date'] < df['Date'][i]) ] )

Attempt 2:

df['Previous Games'] = [ len( df[ (df['Name'] == df['Name'][i]) & (df['Date'] < df['Date'][i]) ] ) for i in range(0, len(df) ) ]

Although Attempt 2 was slightly quicker, it was still not time-efficient so I need help in finding a faster method.

Christian Sloper · Accepted Answer

Any time you write "for" and "pandas" anywhere close together you are probably doing something wrong.

It seems to me you want the cumulative count:

df["prev_games"] = df.sort_values('Date').groupby('Name').cumcount()

Is there a quicker method for iterating over rows in Python to calculate a feature?

Tags:

performance

python

for-loop

pandas

vectorization

Fonti

1 Answers

Christian Sloper

Recent Activity

Donate For Us

Is there a quicker method for iterating over rows in Python to calculate a feature?

Tags:

performance

python

for-loop

pandas

vectorization

Fonti

1 Answers

Christian Sloper

Related questions

Recent Activity

Donate For Us