Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Top-k scoring across multiple dimensions

In e.g. Natural Language Processing in Machine Learning, a beam-search is often used to predict the next objects to add on to a sequence and rank them. A key part of the beam-search is the top-k score metric, which is effectively: Given list of choices of length N of probability scores, return the top k scoring items of N. This is as simple as sorting a list and then taking the top values.
Referring to a visual example https://www.researchgate.net/figure/A-partially-completed-beam-search-procedure-with-a-beam-width-of-5-for-an-example-input_fig2_317377611 in a beam-search (in the above case, k=5, and a ‘top’ score is a minimal value), at each iteration, each node selects the top k items from the list of choices N, resulting in k2 total potential paths. From these paths, the top k overall are filtered, which form the nodes for the next iteration. In the previous example, you can see only the filtered nodes at each time-step. https://d2l.ai/_images/beam-search.svg expands the case of k=2, N=5 comprehensively.

Imagine, instead of optimising one choice from N for each branch/node, you had to choose multiple values: When exploring from a node, you have a set of choices of dimension (N, q) from which you want to select q values, one from each column q. Then, to find the highest-scoring sets of choices, you need to consider combinations of the values in these columns. For example: For a matrix of choices N=5, q=4:

+---+--------+--------+--------+--------+
| N |   q0   |   q1   |   q2   |   q3   |
+---+--------+--------+--------+--------+
| 0 | 0.9763 | 0.0791 | 0.1530 | 0.5565 |
| 1 | 0.1560 | 0.1014 | 0.6932 | 0.7551 |
| 2 | 0.8142 | 0.9494 | 0.4582 | 0.4411 |
| 3 | 0.3807 | 0.2403 | 0.6897 | 0.7356 |
| 4 | 0.0156 | 0.9419 | 0.9568 | 0.2266 |
+---+--------+--------+--------+--------+

If k=5, this top-k function should return the following:

  1. 3.6376 = q0[0] + q1[2] + q2[4] + q3[1]
  2. 3.6301 = q0[0] + q1[4] + q2[4] + q3[1]
  3. 3.6181 = q0[0] + q1[2] + q2[4] + q3[3]
  4. 3.6106 = q0[0] + q1[4] + q2[4] + q3[3]
  5. 3.4755 = q0[2] + q1[2] + q2[4] + q3[1]

which are the largest possible sums, using one value from each column.

Solving this for arbitrary N and q, the naive approach would be to calculate all Nq sums, sort them, then take the top k results. A first step of optimisation would be to sort each column, then only calculate the combinations of sums from the top k values in each column, reducing the complexity to kq.

However, given this function to find top scores must be called k times every time-step of the beam-search, every possible speedup is vital if one wishes to scale to high k or high q. The best solution I’ve come up with (condensed to a minimum example, assuming matrix is a numpy array of shape (N, q), and taking q to be 4):

import numpy as np
from itertools import combinations


class Beamsearch():
    def __init__(self, klen, q=4):
        self.klen = klen
        self.combis = []
        for lens in range(klen):
            self.combis.extend(list(self.partition(lens, q)))
        self.width = q
        self.wdth = list(range(q))

    def partition(self, N, size):
        n = N + size - 1
        for splits in combinations(range(n), size - 1):
            yield [s1 - s0 - 1 for s0, s1 in zip((-1,) + splits, splits + (n,))]

    def getkmaxscores(self, matrix):
        matrix_argsort = np.argsort(-matrix, axis=0)
        sums = []
        for comb in self.combis:
            midxs = matrix_argsort[comb, self.wdth]
            midxslist = midxs.tolist()
            msum = (sum(matrix[midxs, self.wdth]),
                    midxslist)
            sums.append(msum)
        sums.sort(reverse=True)
        return sums[:self.klen]

This method creates partitions of integers p into a given width q for integers 0 ≤ p ≤ k, e.g. for q=4:

p0: [0, 0, 0, 0]
p1: [0, 0, 0, 1], [0, 0, 1, 0], [0, 1, 0, 0], [1, 0, 0, 0]
p2: [0, 0, 0, 2], [0, 0, 1, 1], [0, 0, 2, 0], [0, 1, 0, 1], [0, 1, 1, 0], [0, 2, 0, 0], [1, 0, 0, 1], [1, 0, 1, 0], [1, 1, 0, 0], [2, 0, 0, 0]

etc.

These are then used to index the argsorted input matrix, to select each combination for summation. The length of pi in the case q=4 follows the triangular pyramidal sequence (https://oeis.org/A000292): This reduces the search space to the sum of all p0...k which is the Binomial coefficient (k,4) = k(k-1)(k-2)(k-3)/24 (https://oeis.org/A000332). This is a vast improvement over the k4 solution for small k (for k < 30, this is less than k3), but still grows on the order of k4. Does there exist a solution to the arbitrary case with complexity <O(kq) ?

like image 482
Monomaniac Avatar asked Nov 23 '25 22:11

Monomaniac


1 Answers

This problem is known in the literature as selecting from X + Y. The canonical reference is Frederickson and Johnson who gave an optimal O(k)-time algorithm when X and Y are sorted. Your columns are not sorted, and F&J's algorithm is pretty complicated, so let me sketch the simpler O(k log k) algorithm.

First for both X and Y, select the top k elements and sort them. Initialize a max-heap where the priority of an element (i, j) is X[i] + Y[j]. Insert (0, 0). Repeat the following k times: pop the max element (i, j) and record its priority. Insert (i, j+1). If j = 0, also insert (i+1, 0). This all takes time O(n + k log k) where n is the number of elements in the column.

Finally, let's reduce the problem to two columns. If there are more than two, e.g., X, Y, Z, then we can select the top k elements from X + Y and then select the top k from (X + Y) + Z.

like image 147
David Eisenstat Avatar answered Nov 25 '25 12:11

David Eisenstat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!