Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas dataframe to adjacency matrix

I have a pandas dataframe of the form:

index | id    | group
0     | abc   | A
1     | abc   | B
2     | abc   | B
3     | abc   | C
4     | def   | A
5     | def   | B
6     | ghi   | B
7     | ghi   | C

I would like to transform this to a weighted graph / adjacency matrix where nodes are the 'group', and the weights are the sum of shared ids per group pair:

The weights are the count of the group pair combinations per id, so:

AB = 'abc' indexes (0,1),(0,2) + 'def' indexes (4,5) = 3

AC = 'abc' (0,3) = 1

BC = 'abc' (2,3), (1,3) + 'ghi' (6,7) = 3

and the resulting matrix would be:

    A  |B  |C
A| 0   |3  |1
B| 3   |0  |3
C| 1   |3  |0

At the moment I am doing this very inefficiently by:

f = df.groupby(['id']).agg({'group':pd.Series.nunique}) # to count groups per id
f.loc[f['group']>1] # to get a list of the ids with >1 group

# i then for loop through the id's getting the count of values per pair (takes a long time). 

This is a first pass crude hack approach, I'm sure there must be an alternative approach using groupby or crosstab but I cant figure it out.

like image 327
BenP Avatar asked Dec 13 '25 10:12

BenP


2 Answers

You can use the following:

df_merge = df.merge(df, on='id')
results = pd.crosstab(df_merge.group_x, df_merge.group_y)
np.fill_diagonal(results.values, 0)
results

Output:

group_y  A  B  C
group_x         
A        0  3  1
B        3  0  3
C        1  3  0

Note: the difference i your result and my result C-B and B-C three instead of two, is due to duplicate records for B-abc index row 1 and 2.

like image 126
Scott Boston Avatar answered Dec 16 '25 00:12

Scott Boston


Maybe try dot

s=pd.crosstab(df.id,df.group)
s=s.T.dot(s)
s.values[[np.arange(len(s))]*2] = 0
s
Out[15]: 
group  A  B  C
group         
A      0  3  1
B      3  0  3
C      1  3  0
like image 38
BENY Avatar answered Dec 15 '25 22:12

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!