Top-k scoring across multiple dimensions

Question

In e.g. Natural Language Processing in Machine Learning, a beam-search is often used to predict the next objects to add on to a sequence and rank them. A key part of the beam-search is the top-k score metric, which is effectively: Given list of choices of length N of probability scores, return the top k scoring items of N. This is as simple as sorting a list and then taking the top values.
Referring to a visual example https://www.researchgate.net/figure/A-partially-completed-beam-search-procedure-with-a-beam-width-of-5-for-an-example-input_fig2_317377611 in a beam-search (in the above case, k=5, and a ‘top’ score is a minimal value), at each iteration, each node selects the top k items from the list of choices N, resulting in k² total potential paths. From these paths, the top k overall are filtered, which form the nodes for the next iteration. In the previous example, you can see only the filtered nodes at each time-step. https://d2l.ai/_images/beam-search.svg expands the case of k=2, N=5 comprehensively.

Imagine, instead of optimising one choice from N for each branch/node, you had to choose multiple values: When exploring from a node, you have a set of choices of dimension (N, q) from which you want to select q values, one from each column q. Then, to find the highest-scoring sets of choices, you need to consider combinations of the values in these columns. For example: For a matrix of choices N=5, q=4:

+---+--------+--------+--------+--------+
| N |   q0   |   q1   |   q2   |   q3   |
+---+--------+--------+--------+--------+
| 0 | 0.9763 | 0.0791 | 0.1530 | 0.5565 |
| 1 | 0.1560 | 0.1014 | 0.6932 | 0.7551 |
| 2 | 0.8142 | 0.9494 | 0.4582 | 0.4411 |
| 3 | 0.3807 | 0.2403 | 0.6897 | 0.7356 |
| 4 | 0.0156 | 0.9419 | 0.9568 | 0.2266 |
+---+--------+--------+--------+--------+

If k=5, this top-k function should return the following:

3.6376 = q₀[0] + q₁[2] + q₂[4] + q₃[1]
3.6301 = q₀[0] + q₁[4] + q₂[4] + q₃[1]
3.6181 = q₀[0] + q₁[2] + q₂[4] + q₃[3]
3.6106 = q₀[0] + q₁[4] + q₂[4] + q₃[3]
3.4755 = q₀[2] + q₁[2] + q₂[4] + q₃[1]

which are the largest possible sums, using one value from each column.

Solving this for arbitrary N and q, the naive approach would be to calculate all N^q sums, sort them, then take the top k results. A first step of optimisation would be to sort each column, then only calculate the combinations of sums from the top k values in each column, reducing the complexity to k^q.

However, given this function to find top scores must be called k times every time-step of the beam-search, every possible speedup is vital if one wishes to scale to high k or high q. The best solution I’ve come up with (condensed to a minimum example, assuming matrix is a numpy array of shape (N, q), and taking q to be 4):

import numpy as np
from itertools import combinations


class Beamsearch():
    def __init__(self, klen, q=4):
        self.klen = klen
        self.combis = []
        for lens in range(klen):
            self.combis.extend(list(self.partition(lens, q)))
        self.width = q
        self.wdth = list(range(q))

    def partition(self, N, size):
        n = N + size - 1
        for splits in combinations(range(n), size - 1):
            yield [s1 - s0 - 1 for s0, s1 in zip((-1,) + splits, splits + (n,))]

    def getkmaxscores(self, matrix):
        matrix_argsort = np.argsort(-matrix, axis=0)
        sums = []
        for comb in self.combis:
            midxs = matrix_argsort[comb, self.wdth]
            midxslist = midxs.tolist()
            msum = (sum(matrix[midxs, self.wdth]),
                    midxslist)
            sums.append(msum)
        sums.sort(reverse=True)
        return sums[:self.klen]

This method creates partitions of integers p into a given width q for integers 0 ≤ p ≤ k, e.g. for q=4:

p0: [0, 0, 0, 0]
p1: [0, 0, 0, 1], [0, 0, 1, 0], [0, 1, 0, 0], [1, 0, 0, 0]
p2: [0, 0, 0, 2], [0, 0, 1, 1], [0, 0, 2, 0], [0, 1, 0, 1], [0, 1, 1, 0], [0, 2, 0, 0], [1, 0, 0, 1], [1, 0, 1, 0], [1, 1, 0, 0], [2, 0, 0, 0]

etc.

These are then used to index the argsorted input matrix, to select each combination for summation. The length of p_i in the case q=4 follows the triangular pyramidal sequence (https://oeis.org/A000292): This reduces the search space to the sum of all p_0...k which is the Binomial coefficient (k,4) = k(k-1)(k-2)(k-3)/24 (https://oeis.org/A000332). This is a vast improvement over the k⁴ solution for small k (for k < 30, this is less than k³), but still grows on the order of k⁴. Does there exist a solution to the arbitrary case with complexity <O(k^q) ?

David Eisenstat · Accepted Answer

This problem is known in the literature as selecting from X + Y. The canonical reference is Frederickson and Johnson who gave an optimal O(k)-time algorithm when X and Y are sorted. Your columns are not sorted, and F&J's algorithm is pretty complicated, so let me sketch the simpler O(k log k) algorithm.

First for both X and Y, select the top k elements and sort them. Initialize a max-heap where the priority of an element (i, j) is X[i] + Y[j]. Insert (0, 0). Repeat the following k times: pop the max element (i, j) and record its priority. Insert (i, j+1). If j = 0, also insert (i+1, 0). This all takes time O(n + k log k) where n is the number of elements in the column.

Finally, let's reduce the problem to two columns. If there are more than two, e.g., X, Y, Z, then we can select the top k elements from X + Y and then select the top k from (X + Y) + Z.

Top-k scoring across multiple dimensions

Tags:

python

algorithm

time-complexity

optimization

machine-learning

Monomaniac

1 Answers

David Eisenstat

Recent Activity

Donate For Us

Top-k scoring across multiple dimensions

Tags:

python

algorithm

time-complexity

optimization

machine-learning

Monomaniac

1 Answers

David Eisenstat

Related questions

Recent Activity

Donate For Us