I'm working on an algorithmic problem that has me stumped. It is related to both the assignment problem and maximum cardinality matching. I'll be using it to match large amounts of data so it's important to be fast, ideally O(n2) or better.
Consider a set of universities U and a set of prospective students S. Assume that |U| << |S|. Each university has a specific number of slots available to admit students, and a binary criteria for whether each student in S is eligible or ineligible to attend. The number of prospective students is equal to the total number of slots available across all universities.
Is there a fast algorithm that will assign students to universities such that:
This can be modeled as a maximum-cardinality bipartite matching problem.
Construct the bipartite as follows:
The Hopcraft-Karp algorithm can find a maximum-cardinality matching in roughly O(n2.5) (where n is the number of students). If the cardinality of the match is equal to the number of students, the problem has been solved. If not, a solution is impossible for the given dataset.
However this approach seems to be inefficient because there are far more students than universities. Each university node has to be duplicated many times, resulting in a huge number of edges for the Hopcraft-Karp algorithm to process.
Is there a better way of abstracting this problem that I'm not seeing? Or is there a clever algorithm that can make use of the fact that there are far fewer universities than students and improve the runtime?
There’s a simple O(|S| |U|²)-time algorithm, which meets your criterion in the worst case if |U| = O(√|S|) and likely does a lot better than the quoted running time in practice. (See also “bipush” variants of standard flow algorithms.)
The basis of this algorithm is Ford–Fulkerson. For each student in turn, we try to assign them a university. If we’re lucky, one of their choices has capacity, but in general, we need to shuffle other students around to make capacity (find an augmenting path).
Now, what typically makes finding an augmenting path expensive is that we have to traverse the whole graph. However! If we keep a |U|×|U| array where the (i, j) entry is a list of students that can move from university i to university j, then this takes O(|U|²) time (breadth-first search from the set of admissible universities of the new student; we only need to know if each cell of the array is empty). We (re)assign at most |U| students, each of which generates O(|U|) work to update the lists, so the total cost per iteration is O(|U|²).
One nice property of this algorithm in practice (many flow algorithms, actually) is that, if you feed it a partial assignment with k unmatched students, then the running time is O(|S| |U| + k |U|²). The idea would be to find a fast heuristic that does a reasonable but incomplete job (e.g., shuffle the students and try to assign each one to the admissible university that is least popular with other students).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With