Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL aggregate unique pairs

I have a PostgreSQL table that is mostly a bridge table but it also has some extra stuff.

Essentially it holds the information about players in a game. So we have a unique id for this instance of a player in a game. Then an id that is FK to game table, and an id that is FK to player table. There is also some other irrelevant stuff. Something like this:

Table players_games
| id        | 12564
| player_id | 556
| game_id   | 156184

What I want to do is find how many occurrences there are of a player playing with another. So, if player1 is in the same game as player2, they have played together once. There are 2+ players in a game.

So what I want to do is populate a new table, that holds three values: player_lo, player_hi, times_played.

And either have one row for each pair and the number of times they played, or if it ends up being more efficient, a row for each iteration and have the value set as 1 so these can be added together later, maybe distributed. So you might see something like:

p1, p2, 1
p1, p2, 1

And these get reduced later to:

p1, p2, 2

So I was wondering if there was some clever way to do this with SQL, or if there's SQL that can reduce my programming effort, before starting to write a slightly complex python script to do it.

like image 227
Tom Carrick Avatar asked May 22 '26 17:05

Tom Carrick


2 Answers

To do this, you need to do a self join on the player_games table. The first subquery is for the first player, and the second for the second player. The "first" player is the one with the lower player id.

select pg1.player_id as player1, pg2.player_id as player2, count(*) as num_games
from (select distinct game_id, player_id
      from  players_games pg
     ) pg1 join
     (select distinct game_id, player_id
      from players_games pg
     ) pg2
     on pg1.game_id = pg2.game_id and
        pg1.player_id < pg2.player_id
group by pg1.player_id, pg2.player_id

Note that the join condition uses a "<" on the player ids. This is to prevent counting duplicates (so players A,B are not also counted as B,A).

Also, I added a "distinct" in the inner subqueries just in case a single player might appear more than once for a given game. Perhaps this is not necessary. To be sure, you should have a unique index on the composite key game_id, player_id.

like image 135
Gordon Linoff Avatar answered May 25 '26 06:05

Gordon Linoff


select p1, p2, count(*) from (
    select 
      pg1.player_id as p1, pg1.game_id, pg2.player_id as p2
    from
      players_games pg1, players_games pg2
    where
      pg1.game_id = pg2.game_id and pg1.player_id != pg2.player_id
) foo
group by p1, p2

Note that this does a full join on players_games so it can be very slow if the table is large. The key part is the group by for getting the count.

like image 45
beerbajay Avatar answered May 25 '26 08:05

beerbajay