Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group dictionary keys based on their list values matching condition?

I would like to group the keys of dictionary based on the list value assigned to the key. For example,

edge_dict = {'ED7':[[15,-10,0], [20,-10,0]], 'ED10':[[0,0,0],[10,0,0]], 'ED13':[[15,0,0], [20,0,0]], 'ED4':[[20,0,0], [25,0,0]], 'ED2':[[15,0,0], [10,0,0]], 'ED5':[[0,-10,0],[10,-10,0]], 'ED6':[[15,-10,0], [10,-10,0]], 'ED8':[[20,-10,0], [25,-10,0]]}

rib1_edges  :  ['ED10', 'ED5']

rib2_edges  :  ['ED4', 'ED8']

Dictionary "edge_dict " have a Key 'ED10' whose sub list ([10,0,0]) is equal to another key in the same dictionary, that is, 'ED2'. Likewise, i would like to collect and group all those keys in edge_dict which are connected by a common sublist value.

Expected answer is

selected_group_1 = {'ED10':[[0,0,0],[10,0,0]], 'ED2':[[15,0,0], [10,0,0]], 'ED13':[[15,0,0], [20,0,0]], 'ED4':[[20,0,0], [25,0,0]]}
selected_group_2 = {'ED5':[[0,-10,0],[10,-10,0]], 'ED6':[[15,-10,0], [10,-10,0]], 'ED7':[[15,-10,0], [20,-10,0]], 'ED8':[[20,-10,0], [25,-10,0]]}

It should be noted that the selected groups have an order from values in list rib1_edges to rib2_edges. That means one group starts with key 'ED10' and ends in either 'ED4' or 'ED8' depending on how values are arranged in edge_dict.

Can anyone help me to group the dictionary "edge_dict" as shown in the expected answer?

I started off something like this,

import math

def is_equal(pnt1, pnt2):
    L = math.sqrt((pnt1[0]-pnt2[0]) ** 2 + (pnt1[1]-pnt2[1]) ** 2 + (pnt1[2]-pnt2[2]) ** 2)
    if L<1e-4:
        return True
    else:
        return False

for i in range(len(list(edge_dict.keys()))/2):
    for ed, pnts in {k:v for k, v in edge_dict.items() if k not in list(selected_group_1.keys())}.items():
        if (any(is_equal(selected_group_1[edge][0], pnts[0]) or any(is_equal(selected_group_1[edge][1], pnts[0]) or any(is_equal(selected_group_1[edge][0], pnts[1]) or any(is_equal(selected_group_1[edge][1], pnts[1])):
            selected_group_1[ed] = pnts

I could not put the logic properly. Any help would be appreciated.

like image 554
Paul Thomas Avatar asked Dec 01 '25 05:12

Paul Thomas


1 Answers

You're basically asking for an algorithm that can calculate connected components in a graph. Although you could write it by hand using either DFS or BFS, sometimes it's more convenient to use a ready-made solution such as scipy.sparse.csgraph.connected_components .

You have to transform your graph in a way that can be digested by the algorithm.

Invert keys and values

Invert keys and values and build the dictionary dct.

from collections import defaultdict
from scipy.sparse.csgraph import connected_components
from scipy.sparse import csr_matrix

edge_dict = {
    'ED7':[[15,-10,0], [20,-10,0]],
    'ED10':[[0,0,0],[10,0,0]],
    'ED13':[[15,0,0], [20,0,0]],
    'ED4':[[20,0,0], [25,0,0]],
    'ED2':[[15,0,0], [10,0,0]],
    'ED5':[[0,-10,0],[10,-10,0]],
    'ED6':[[15,-10,0], [10,-10,0]],
    'ED8':[[20,-10,0], [25,-10,0]]
}

edge_pairs = [ ['ED10', 'ED5'], ['ED4', 'ED8'] ]

dct = defaultdict(list)

for item in edge_dict.items():
    key = item[0]
    value = item[1]
    for lst in value:
        hashed_lst = str(lst)
        dct[hashed_lst].append(key)

print(dct)

This will be the output of dct.

# defaultdict(<class 'list'>, {'[15, -10, 0]': ['ED7', 'ED6'],
# '[20, -10, 0]': ['ED7', 'ED8'],
# '[0, 0, 0]': ['ED10'],
# '[10, 0, 0]': ['ED10', 'ED2'],
# '[15, 0, 0]': ['ED13', 'ED2'],
# '[20, 0, 0]': ['ED13', 'ED4'],
# '[25, 0, 0]': ['ED4'],
# '[0, -10, 0]': ['ED5'],
# '[10, -10, 0]': ['ED5', 'ED6'],
# '[25, -10, 0]': ['ED8']})

Adjacency List

  • Build an adjacency list of your ED labels and call it graph_dct. If there's an edge between two labels, then they belong to the same group.

    graph_dct = defaultdict(list)

    for lst in dct.values(): for item in lst: for item2 in lst: if item != item2: graph_dct[item].append(item2)

    print(graph_dct)

This will be the output of graph_dct.

    # graph_dct :
    # defaultdict(<class 'list'>, {'ED7': ['ED6', 'ED8'],
    # 'ED6': ['ED7', 'ED5'],
    # 'ED8': ['ED7'], 'ED10': ['ED2'],
    # 'ED2': ['ED10', 'ED13']
    # 'ED13': ['ED2', 'ED4'],
    # 'ED4': ['ED13'], 'ED5': ['ED6']})

Adjacency Matrix

This is a requirement of the scipy connected components algorithm. We must transform our adjacency list into an adjacency matrix. It will be called sparse_matrix.

graph_dct_keys = list(graph_dct.keys());

sparse_matrix = []
for key in graph_dct_keys:
    row = [0] * len(graph_dct_keys)
    for item in graph_dct[key]:
        row[graph_dct_keys.index(item)] = 1
    sparse_matrix.append(row)

Algorithm

Now we pass sparse_matrix to the connected components algorithm and then it will give you the groups.

components = csr_matrix(sparse_matrix)
n_components, labels = connected_components(csgraph=components, directed=False, return_labels=True)

labels_dct = defaultdict(list)

for idx, item in enumerate(labels):
    labels_dct[str(item)].append(graph_dct_keys[idx])

print(labels_dct.values())
# dict_values([
# ['ED7', 'ED6', 'ED8', 'ED5'],
# ['ED10', 'ED2', 'ED13', 'ED4']
# ])

results = list(labels_dct.values())

Now you have results, which is a list of lists of labels that can be used to build your expected answer very easily.

Sorting

Next we'll sort results according to the requirements and generate the expected answer.

results = [['ED7', 'ED6', 'ED8', 'ED5'], ['ED10', 'ED2', 'ED13', 'ED4']]
rib1_edges = ['ED10', 'ED5']
rib2_edges = ['ED4', 'ED8']
for idx,lst in enumerate(results):
    results[idx] = sorted(lst, key=lambda x: int(x.lstrip('ED')) )
for idx,lst in enumerate(results):
    results[idx] = [ x for x in rib1_edges if x in lst ] + \
        [ x for x in lst if x not in rib1_edges and x not in rib2_edges ] + \
        [ x for x in rib2_edges if x in lst ]

print(results)

This will give the desired result, [['ED5', 'ED6', 'ED7', 'ED8'], ['ED10', 'ED2', 'ED13', 'ED4']] .

Getting the Answer

Notice that there's no guarantee that this dictionary will be insertion ordered if you're using a version of Python below 3.6 . In that case, the keys may appear in any seemingly random order determined by Python's internal algorithm.

So if you're running Python 3.6+ it's ok to use a dictionary, but for portability, it's better to go with an OrderedDict.

from collections import OrderedDict

edge_dict = {
    'ED7':[[15,-10,0], [20,-10,0]],
    'ED10':[[0,0,0],[10,0,0]],
    'ED13':[[15,0,0], [20,0,0]],
    'ED4':[[20,0,0], [25,0,0]],
    'ED2':[[15,0,0], [10,0,0]],
    'ED5':[[0,-10,0],[10,-10,0]],
    'ED6':[[15,-10,0], [10,-10,0]],
    'ED8':[[20,-10,0], [25,-10,0]]
}

result = [
          ['ED5', 'ED6', 'ED7', 'ED8'],
          ['ED10', 'ED2', 'ED13', 'ED4']
]

answer = []
for lst in result:
    od = OrderedDict()
    for item in lst:
        od[item] = edge_dict[item]
    answer.append(od)

import pprint
pp = pprint.PrettyPrinter()
pp.pprint(answer)

And then this will give us the expected answer.

[OrderedDict([('ED5', [[0, -10, 0], [10, -10, 0]]),
              ('ED6', [[15, -10, 0], [10, -10, 0]]),
              ('ED7', [[15, -10, 0], [20, -10, 0]]),
              ('ED8', [[20, -10, 0], [25, -10, 0]])]),
 OrderedDict([('ED10', [[0, 0, 0], [10, 0, 0]]),
              ('ED2', [[15, 0, 0], [10, 0, 0]]),
              ('ED13', [[15, 0, 0], [20, 0, 0]]),
              ('ED4', [[20, 0, 0], [25, 0, 0]])])]
like image 173
Ruan Avatar answered Dec 03 '25 19:12

Ruan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!