I have a Pandas DataFrame with the following structure
| left_id | right_id |
|---|---|
| a | b |
| c | a |
| x | y |
I need to transform this into a list of sets, like
[
{'a', 'b', 'c'},
{'x', 'y'}
]
the first two rows should be combined as a single set, because row 1 has id a and b and row 2 has ids c and a which, in this df, means the three IDs are related.
What is the right way to do this?
You can group connected IDs using NetworkX or a simple union-find approach.
import pandas as pd
import networkx as nx
df = pd.DataFrame({'left_id': ['a', 'c', 'x'], 'right_id': ['b', 'a', 'y']})
G = nx.from_pandas_edgelist(df, 'left_id', 'right_id')
result = [set(c) for c in nx.connected_components(G)]
print(result)
# [{'a', 'b', 'c'}, {'x', 'y'}]
This builds a graph of linked IDs and extracts connected components as sets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With