How to generate graph from Pandas dataframe with full hierachical data?

Question

Having a Pandas dataframe with sample hierarchical data i.e.

data = pd.DataFrame({
"manager_id": ["A", "A", "B", "A", "C", "A", "B"],
"employee_id": ["B", "C", "C", "D", "E", "E", "E"]
} )

Given that the data consists of all the descendent relationships for each manager. For example in each manager id (e.g. "A"), the employee id comprises both the employees (e.g. "B") directly managed by manager "A" and the employees managed by employee "B" (e.g. "C", "D"). How can this be represented into a directed graph in networkx? The edges of the graph should be

[('A', 'B'), ('A', 'D'), ('B', 'C'), ('C', 'E')]

mozway · Accepted Answer

Read the DataFrame as a directed graph with from_pandas_edgelist, then sort the nodes with topological_sort and only keep the edge with the parent that has the greatest topological index per child:

G = nx.from_pandas_edgelist(data, source='manager_id', target='employee_id',
                            create_using=nx.DiGraph)

# topological order
order = {n: i for i,n in enumerate(nx.topological_sort(G))}
# {'A': 0, 'B': 1, 'D': 2, 'C': 3, 'E': 4}

# for each node, only keep the parent that has the greatest topological index
parents = {}
for source, target in G.edges():
    p = parents.setdefault(target, source)
    if order[source] > order[p]:
        parents[target] = source

# parents
# {'C': 'B', 'E': 'C', 'B': 'A', 'D': 'A'}

# remove shorter edges
G.remove_edges_from(G.edges - set(zip(parents.values(), parents.keys())))
# or
# G = G.edge_subgraph(list(zip(parents.values(), parents.keys())))

Output:

networkx remove extra edges with topological sort

Graph before filtering:

networkx input graph with duplicated paths

Variant

You can also compute an order with topological_generations to have the same number for nodes of the same generation.

order = {n: i for i, l in enumerate(nx.topological_generations(G)) for n in l}
# {'A': 0, 'B': 1, 'D': 1, 'C': 2, 'E': 3}

In link to your other question, you could also compute the relative generation difference and only keep the edges with a generation difference of 1 between the two nodes:

G = nx.from_pandas_edgelist(data, source='manager_id', target='employee_id',
                            create_using=nx.DiGraph)

order = {n: i for i, l in enumerate(nx.topological_generations(G)) for n in l}
# {'A': 0, 'B': 1, 'D': 1, 'C': 2, 'E': 3}

# compute the relative generation difference
# keep the edges with a difference of 1
keep = [e for e in G.edges if order[e[1]]-order[e[0]] == 1]
# [('A', 'B'), ('A', 'D'), ('B', 'C'), ('C', 'E'), ('F', 'G')]

G = G.edge_subgraph(list(zip(parents.values(), parents.keys())))

Bending Rodriguez · Answer

At first i didn't properly understand what you were trying to acomplish, now i've created an iterate approach to do it:

import networkx as nx
import pandas as pd

man_id ='A'
NodeCons=[]
employees =[]
data = pd.DataFrame({
"manager_id": ["A", "A", "B", "A", "C", "A", "B"],
"employee_id": ["B", "C", "C", "D", "E", "E", "E"]
} )

def searchEmployees(man):
    employees =[]
    for index,row in data.iterrows():
        
        if row['manager_id'] == man:
            NodeCons.append((row['manager_id'],row['employee_id']))
            
            employees.append(row['employee_id'])
     
            searchEmployees(row['employee_id'])
       
searchEmployees(man_id)

Then create Graph out of the list:

G = nx.Graph()
G.add_edges_from(NodeCons)


nx.draw_networkx(G)

How to generate graph from Pandas dataframe with full hierachical data?

Tags:

python

networkx

Chongxi Hong

2 Answers

Variant

mozway

Bending Rodriguez

Recent Activity

Donate For Us

How to generate graph from Pandas dataframe with full hierachical data?

Tags:

python

networkx

Chongxi Hong

2 Answers

Variant

mozway

Bending Rodriguez

Related questions

Recent Activity

Donate For Us