Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for assigning items to groups based on binary criteria

I'm working on an algorithmic problem that has me stumped. It is related to both the assignment problem and maximum cardinality matching. I'll be using it to match large amounts of data so it's important to be fast, ideally O(n2) or better.

Problem Statement

Consider a set of universities U and a set of prospective students S. Assume that |U| << |S|. Each university has a specific number of slots available to admit students, and a binary criteria for whether each student in S is eligible or ineligible to attend. The number of prospective students is equal to the total number of slots available across all universities.

Is there a fast algorithm that will assign students to universities such that:

  • Each university admits exactly the number of students they wanted
  • Only students eligible for a university will be assigned to that university
  • If those conditions are not possible, it will return some value indicating that it was not possible (either a best possible incomplete match for the conditions or just -1/error/etc)

What I've considered

This can be modeled as a maximum-cardinality bipartite matching problem.

Construct the bipartite as follows:

  • Each prospective student is a node
  • Each university is a node
  • For each university, draw an edge to each eligible student
  • Duplicate each university node to represent the number of available slots for that university

The Hopcraft-Karp algorithm can find a maximum-cardinality matching in roughly O(n2.5) (where n is the number of students). If the cardinality of the match is equal to the number of students, the problem has been solved. If not, a solution is impossible for the given dataset.

However this approach seems to be inefficient because there are far more students than universities. Each university node has to be duplicated many times, resulting in a huge number of edges for the Hopcraft-Karp algorithm to process.

Is there a better way of abstracting this problem that I'm not seeing? Or is there a clever algorithm that can make use of the fact that there are far fewer universities than students and improve the runtime?

like image 849
Brooklyn Rose Ludlow Avatar asked Jun 24 '26 23:06

Brooklyn Rose Ludlow


1 Answers

There’s a simple O(|S| |U|²)-time algorithm, which meets your criterion in the worst case if |U| = O(√|S|) and likely does a lot better than the quoted running time in practice. (See also “bipush” variants of standard flow algorithms.)

The basis of this algorithm is Ford–Fulkerson. For each student in turn, we try to assign them a university. If we’re lucky, one of their choices has capacity, but in general, we need to shuffle other students around to make capacity (find an augmenting path).

Now, what typically makes finding an augmenting path expensive is that we have to traverse the whole graph. However! If we keep a |U|×|U| array where the (i, j) entry is a list of students that can move from university i to university j, then this takes O(|U|²) time (breadth-first search from the set of admissible universities of the new student; we only need to know if each cell of the array is empty). We (re)assign at most |U| students, each of which generates O(|U|) work to update the lists, so the total cost per iteration is O(|U|²).

One nice property of this algorithm in practice (many flow algorithms, actually) is that, if you feed it a partial assignment with k unmatched students, then the running time is O(|S| |U| + k |U|²). The idea would be to find a fast heuristic that does a reasonable but incomplete job (e.g., shuffle the students and try to assign each one to the admissible university that is least popular with other students).

like image 165
David Eisenstat Avatar answered Jun 27 '26 22:06

David Eisenstat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!