Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform DataFrame containing ID pairs into a list of sets

I have a Pandas DataFrame with the following structure

left_id right_id
a b
c a
x y

I need to transform this into a list of sets, like

[
  {'a', 'b', 'c'},
  {'x', 'y'}
]

the first two rows should be combined as a single set, because row 1 has id a and b and row 2 has ids c and a which, in this df, means the three IDs are related.

What is the right way to do this?

like image 281
Joe Fusaro Avatar asked Dec 03 '25 21:12

Joe Fusaro


1 Answers

You can group connected IDs using NetworkX or a simple union-find approach.

import pandas as pd
import networkx as nx

df = pd.DataFrame({'left_id': ['a', 'c', 'x'], 'right_id': ['b', 'a', 'y']})

G = nx.from_pandas_edgelist(df, 'left_id', 'right_id')
result = [set(c) for c in nx.connected_components(G)]

print(result)
# [{'a', 'b', 'c'}, {'x', 'y'}]

This builds a graph of linked IDs and extracts connected components as sets.

like image 74
Lavi Kumar Avatar answered Dec 05 '25 10:12

Lavi Kumar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!