Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Dataframe self-join with complex conditions

I have a DataFrame consisting of a set of data from 2 player games. Each game (with unique id) has a number of rounds where each of the 2 players chose some action. It looks like the following (I've removed some of the rows for clarity):

    gameId  round   player  action
0   Afom9bWqYBgZXXXN8   1   PvQ8B5kuA9Fbq9N59   1
1   Afom9bWqYBgZXXXN8   1   PJmJgrqusFZ8KRShQ   0
2   Afom9bWqYBgZXXXN8   2   PvQ8B5kuA9Fbq9N59   0
3   Afom9bWqYBgZXXXN8   2   PJmJgrqusFZ8KRShQ   0
4   Afom9bWqYBgZXXXN8   3   PJmJgrqusFZ8KRShQ   0
5   Afom9bWqYBgZXXXN8   3   PvQ8B5kuA9Fbq9N59   0
20  QdZM4yPMnjGj8f25R   1   Q6knaWEruc6BDPQT7   1
21  QdZM4yPMnjGj8f25R   1   xnAjMcWaFRpfBbukz   1
22  QdZM4yPMnjGj8f25R   2   xnAjMcWaFRpfBbukz   1
23  QdZM4yPMnjGj8f25R   2   Q6knaWEruc6BDPQT7   0
24  QdZM4yPMnjGj8f25R   3   Q6knaWEruc6BDPQT7   1
25  QdZM4yPMnjGj8f25R   3   xnAjMcWaFRpfBbukz   1
40  riMD6ctT8DLwdhHpE   1   EKkrMpMqy2PRLm7ur   1
41  riMD6ctT8DLwdhHpE   1   EqbbmngPfZBEmPTzq   1
42  riMD6ctT8DLwdhHpE   2   EKkrMpMqy2PRLm7ur   1
43  riMD6ctT8DLwdhHpE   2   EqbbmngPfZBEmPTzq   1
44  riMD6ctT8DLwdhHpE   3   EqbbmngPfZBEmPTzq   1
45  riMD6ctT8DLwdhHpE   3   EKkrMpMqy2PRLm7ur   1
60  hyEjkAg5K4WpubJA9   1   7CHpY4setLKb9ssnN   1
61  hyEjkAg5K4WpubJA9   1   hbud2J3YvitEhj4xZ   0
62  hyEjkAg5K4WpubJA9   2   hbud2J3YvitEhj4xZ   0
63  hyEjkAg5K4WpubJA9   2   7CHpY4setLKb9ssnN   0
64  hyEjkAg5K4WpubJA9   3   7CHpY4setLKb9ssnN   0
65  hyEjkAg5K4WpubJA9   3   hbud2J3YvitEhj4xZ   1
80  ay5pmpeNcwqHJ8JBH   1   tWA9ZxSnKpZyWwYsQ   1
81  ay5pmpeNcwqHJ8JBH   1   2qiHdJgL4WQe5qrHQ   1
82  ay5pmpeNcwqHJ8JBH   2   2qiHdJgL4WQe5qrHQ   1
83  ay5pmpeNcwqHJ8JBH   2   tWA9ZxSnKpZyWwYsQ   1
84  ay5pmpeNcwqHJ8JBH   3   tWA9ZxSnKpZyWwYsQ   1
85  ay5pmpeNcwqHJ8JBH   3   2qiHdJgL4WQe5qrHQ   1

I'd like to add a new column to the DataFrame that contains, for each player's action in a given round, his/her opponent's action in the previous round of the same game, if any. What's a fast, succinct way to do this instead of using a really long (and slow) loop?

Note that within each (gameId, round) key there are only two players with different ids. Dataframe.merge seems like a close match (example), but it would require something like the following:

df.merge(df_copy, left_on=['gameId', 'round', 'player'], \
         right_on=['gameId', df_copy.round - 1, df.player != df_copy.player])

but it can't support the df.player != df_copy.player in a self-join condition.

like image 756
Andrew Mao Avatar asked Oct 29 '25 07:10

Andrew Mao


1 Answers

I think you should start by replacing the player codes with generic aliases, e.g. 1 and 2. You can do it as follows:

s = df.groupby(['gameId', 'player']).size().reset_index(0, drop=True)
s[:] = np.arange(len(s)) % 2 + 1
df['player_alias'] = s.reindex(df.player).values

You can then construct and index of the previous round and opposing player for each row and map it to the corresponding action:

prev_round = df['round'] - 1 
opp_player = 3 - df.player_alias   # effectively maps 2 to 1 and 1 to 2

ix = pd.MultiIndex.from_arrays([df.gameId, prev_round, opp_player])
df['opp_prev_action'] = df.set_index(['gameId', 'round', 'player_alias']
                                     ).reindex(ix).action.values

Note that for round 1, prev_round is 0 which leads to nans in the desired column.

like image 76
JoeCondron Avatar answered Oct 31 '25 05:10

JoeCondron



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!