Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate graph from Pandas dataframe with full hierachical data?

Having a Pandas dataframe with sample hierarchical data i.e.

data = pd.DataFrame({
"manager_id": ["A", "A", "B", "A", "C", "A", "B"],
"employee_id": ["B", "C", "C", "D", "E", "E", "E"]
} )

Given that the data consists of all the descendent relationships for each manager. For example in each manager id (e.g. "A"), the employee id comprises both the employees (e.g. "B") directly managed by manager "A" and the employees managed by employee "B" (e.g. "C", "D"). How can this be represented into a directed graph in networkx? The edges of the graph should be

[('A', 'B'), ('A', 'D'), ('B', 'C'), ('C', 'E')]
like image 894
Chongxi Hong Avatar asked Nov 19 '25 15:11

Chongxi Hong


2 Answers

Read the DataFrame as a directed graph with from_pandas_edgelist, then sort the nodes with topological_sort and only keep the edge with the parent that has the greatest topological index per child:

G = nx.from_pandas_edgelist(data, source='manager_id', target='employee_id',
                            create_using=nx.DiGraph)

# topological order
order = {n: i for i,n in enumerate(nx.topological_sort(G))}
# {'A': 0, 'B': 1, 'D': 2, 'C': 3, 'E': 4}

# for each node, only keep the parent that has the greatest topological index
parents = {}
for source, target in G.edges():
    p = parents.setdefault(target, source)
    if order[source] > order[p]:
        parents[target] = source

# parents
# {'C': 'B', 'E': 'C', 'B': 'A', 'D': 'A'}

# remove shorter edges
G.remove_edges_from(G.edges - set(zip(parents.values(), parents.keys())))
# or
# G = G.edge_subgraph(list(zip(parents.values(), parents.keys())))

Output:

networkx remove extra edges with topological sort

Graph before filtering:

networkx input graph with duplicated paths

Variant

You can also compute an order with topological_generations to have the same number for nodes of the same generation.

order = {n: i for i, l in enumerate(nx.topological_generations(G)) for n in l}
# {'A': 0, 'B': 1, 'D': 1, 'C': 2, 'E': 3}

In link to your other question, you could also compute the relative generation difference and only keep the edges with a generation difference of 1 between the two nodes:

G = nx.from_pandas_edgelist(data, source='manager_id', target='employee_id',
                            create_using=nx.DiGraph)

order = {n: i for i, l in enumerate(nx.topological_generations(G)) for n in l}
# {'A': 0, 'B': 1, 'D': 1, 'C': 2, 'E': 3}

# compute the relative generation difference
# keep the edges with a difference of 1
keep = [e for e in G.edges if order[e[1]]-order[e[0]] == 1]
# [('A', 'B'), ('A', 'D'), ('B', 'C'), ('C', 'E'), ('F', 'G')]

G = G.edge_subgraph(list(zip(parents.values(), parents.keys())))
like image 70
mozway Avatar answered Nov 21 '25 05:11

mozway


At first i didn't properly understand what you were trying to acomplish, now i've created an iterate approach to do it:

import networkx as nx
import pandas as pd

man_id ='A'
NodeCons=[]
employees =[]
data = pd.DataFrame({
"manager_id": ["A", "A", "B", "A", "C", "A", "B"],
"employee_id": ["B", "C", "C", "D", "E", "E", "E"]
} )

def searchEmployees(man):
    employees =[]
    for index,row in data.iterrows():
        
        if row['manager_id'] == man:
            NodeCons.append((row['manager_id'],row['employee_id']))
            
            employees.append(row['employee_id'])
     
            searchEmployees(row['employee_id'])
       
searchEmployees(man_id) 

Then create Graph out of the list:

G = nx.Graph()
G.add_edges_from(NodeCons)


nx.draw_networkx(G)
like image 43
Bending Rodriguez Avatar answered Nov 21 '25 04:11

Bending Rodriguez



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!