Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best algorithm to solve this array problem?

If the for the input, I had a list of lists like so

[[2,1], [1,3], [4,5], [6,8], [4,7]]

How could I find a/the largest subset such that there are no duplicate/shared elements at all?

For this example, an answer would be

[2,1], [4,5], [6,8] 

or

[1,3], [6,8], [4,7] 

but an invalid answer would be

[2,1],[1,3],[4,5] 

since 1 appears twice here.

My current approach is a recursive approach that uses the idea that either I choose the first element or I don't choose the first element while building the subset but this approach seems way too slow.

What I have: Currently it only returns the size of the largest subset rather than the actual subset but it should be easy to get that once the size is working

def disjointSets(allSets):
    maxCount = 0
    used = set()
    def findDisjoint(currCount, arr):
        nonlocal maxCount
        if not arr:
            maxCount = max(maxCount, currCount)
            return
        elif arr[0][0] in used or arr[0][1] in used:
            findDisjoint(currCount, arr[1:])
            return
        else:
            used.add(arr[0][0])
            used.add(arr[0][1])
            findDisjoint(currCount + 1, arr[1:])
            used.clear()
            findDisjoint(currCount, arr[1:])
        return
    findDisjoint(0, allSets)
    return maxCount
like image 324
Pryver Johnson Avatar asked Sep 01 '25 01:09

Pryver Johnson


2 Answers

What you have here is a graph -- numbers are vertices, and pairs of numbers are edges.

Your problem is to find the largest set of independent edges (edges that don't share a vertex).

This is usually called a maximum matching, and that is a well known problem with well known solutions. The Blossom algorithm is the simplest practical way that is guaranteed to find the best answer.

like image 74
Matt Timmermans Avatar answered Sep 02 '25 15:09

Matt Timmermans


As @MattTimmermans says, what you have here is an undirected graph where each number is a node and each pair is an edge. A matching is a set of edges that don't share nodes and your job is to find the maximum matching, i.e. a matching that contains the maximum number of edges.

There's a Python package called networkx that deals with networks, in particular, graphs. There's a method in it called max_weight_matching that implements Edmonds' blossom algorithm.

In your example:

import networkx as nx
G = nx.Graph()
G.add_edges_from([[2,1], [1,3], [4,5], [6,8], [4,7]])
out = nx.max_weight_matching(G)

Output:

{(8, 6), (4, 7), (1, 3)}

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!