Having a Pandas dataframe with sample hierarchical data i.e.
data = pd.DataFrame({
"manager_id": ["A", "A", "B", "A", "C", "A", "B"],
"employee_id": ["B", "C", "C", "D", "E", "E", "E"]
} )
Given that the data consists of all the descendent relationships for each manager. For example in each manager id (e.g. "A"), the employee id comprises both the employees (e.g. "B") directly managed by manager "A" and the employees managed by employee "B" (e.g. "C", "D"). How can this be represented into a directed graph in networkx? The edges of the graph should be
[('A', 'B'), ('A', 'D'), ('B', 'C'), ('C', 'E')]
Read the DataFrame as a directed graph with from_pandas_edgelist, then sort the nodes with topological_sort and only keep the edge with the parent that has the greatest topological index per child:
G = nx.from_pandas_edgelist(data, source='manager_id', target='employee_id',
create_using=nx.DiGraph)
# topological order
order = {n: i for i,n in enumerate(nx.topological_sort(G))}
# {'A': 0, 'B': 1, 'D': 2, 'C': 3, 'E': 4}
# for each node, only keep the parent that has the greatest topological index
parents = {}
for source, target in G.edges():
p = parents.setdefault(target, source)
if order[source] > order[p]:
parents[target] = source
# parents
# {'C': 'B', 'E': 'C', 'B': 'A', 'D': 'A'}
# remove shorter edges
G.remove_edges_from(G.edges - set(zip(parents.values(), parents.keys())))
# or
# G = G.edge_subgraph(list(zip(parents.values(), parents.keys())))
Output:

Graph before filtering:

You can also compute an order with topological_generations to have the same number for nodes of the same generation.
order = {n: i for i, l in enumerate(nx.topological_generations(G)) for n in l}
# {'A': 0, 'B': 1, 'D': 1, 'C': 2, 'E': 3}
In link to your other question, you could also compute the relative generation difference and only keep the edges with a generation difference of 1 between the two nodes:
G = nx.from_pandas_edgelist(data, source='manager_id', target='employee_id',
create_using=nx.DiGraph)
order = {n: i for i, l in enumerate(nx.topological_generations(G)) for n in l}
# {'A': 0, 'B': 1, 'D': 1, 'C': 2, 'E': 3}
# compute the relative generation difference
# keep the edges with a difference of 1
keep = [e for e in G.edges if order[e[1]]-order[e[0]] == 1]
# [('A', 'B'), ('A', 'D'), ('B', 'C'), ('C', 'E'), ('F', 'G')]
G = G.edge_subgraph(list(zip(parents.values(), parents.keys())))
At first i didn't properly understand what you were trying to acomplish, now i've created an iterate approach to do it:
import networkx as nx
import pandas as pd
man_id ='A'
NodeCons=[]
employees =[]
data = pd.DataFrame({
"manager_id": ["A", "A", "B", "A", "C", "A", "B"],
"employee_id": ["B", "C", "C", "D", "E", "E", "E"]
} )
def searchEmployees(man):
employees =[]
for index,row in data.iterrows():
if row['manager_id'] == man:
NodeCons.append((row['manager_id'],row['employee_id']))
employees.append(row['employee_id'])
searchEmployees(row['employee_id'])
searchEmployees(man_id)
Then create Graph out of the list:
G = nx.Graph()
G.add_edges_from(NodeCons)
nx.draw_networkx(G)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With