Repeatedly removing the maximum average subarray

Question

I have an array of positive integers. For example:

[1, 7, 8, 4, 2, 1, 4]

A "reduction operation" finds the array prefix with the highest average, and deletes it. Here, an array prefix means a contiguous subarray whose left end is the start of the array, such as [1] or [1, 7] or [1, 7, 8] above. Ties are broken by taking the longer prefix.

Original array:  [  1,   7,   8,   4,   2,   1,   4]

Prefix averages: [1.0, 4.0, 5.3, 5.0, 4.4, 3.8, 3.9]

-> Delete [1, 7, 8], with maximum average 5.3
-> New array -> [4, 2, 1, 4]

I will repeat the reduction operation until the array is empty:

[1, 7, 8, 4, 2, 1, 4]
^       ^
[4, 2, 1, 4]
^ ^
[2, 1, 4]
^       ^
[]

Now, actually performing these array modifications isn't necessary; I'm only looking for the list of lengths of prefixes that would be deleted by this process, for example, [3, 1, 3] above.

What is an efficient algorithm for computing these prefix lengths?

The naive approach is to recompute all sums and averages from scratch in every iteration for an O(n^2) algorithm-- I've attached Python code for this below. I'm looking for any improvement on this approach-- most preferably, any solution below O(n^2), but an algorithm with the same complexity but better constant factors would also be helpful.

Here are a few of the things I've tried (without success):

Dynamically maintaining prefix sums, for example with a Binary Indexed Tree. While I can easily update prefix sums or find a maximum prefix sum in O(log n) time, I haven't found any data structure which can update the average, as the denominator in the average is changing.
Reusing the previous 'rankings' of prefix averages-- these rankings can change, e.g. in some array, the prefix ending at index 5 may have a larger average than the prefix ending at index 6, but after removing the first 3 elements, now the prefix ending at index 2 may have a smaller average than the one ending at 3.
Looking for patterns in where prefixes end; for example, the rightmost element of any max average prefix is always a local maximum in the array, but it's not clear how much this helps.

This is a working Python implementation of the naive, quadratic method:

from fractions import Fraction
def find_array_reductions(nums: List[int]) -> List[int]:
    """Return list of lengths of max average prefix reductions."""

    def max_prefix_avg(arr: List[int]) -> Tuple[float, int]:
        """Return value and length of max average prefix in arr."""
        if len(arr) == 0:
            return (-math.inf, 0)

        best_length = 1
        best_average = Fraction(0, 1)
        running_sum = 0

        for i, x in enumerate(arr, 1):
            running_sum += x
            new_average = Fraction(running_sum, i)
            if new_average >= best_average:
                best_average = new_average
                best_length = i

        return (float(best_average), best_length)

    removed_lengths = []
    total_removed = 0

    while total_removed < len(nums):
        _, new_removal = max_prefix_avg(nums[total_removed:])
        removed_lengths.append(new_removal)
        total_removed += new_removal

    return removed_lengths

Edit: The originally published code had a rare error with large inputs from using Python's math.isclose() with default parameters for floating point comparison, rather than proper fraction comparison. This has been fixed in the current code. An example of the error can be found at this Try it online link, along with a foreword explaining exactly what causes this bug, if you're curious.

Matt Timmermans · Accepted Answer

This problem has a fun O(n) solution.

If you draw a graph of cumulative sum vs index, then:

The average value in the subarray between any two indexes is the slope of the line between those points on the graph.

The first highest-average-prefix will end at the point that makes the highest angle from 0. The next highest-average-prefix must then have a smaller average, and it will end at the point that makes the highest angle from the first ending. Continuing to the end of the array, we find that...

These segments of highest average are exactly the segments in the upper convex hull of the cumulative sum graph.

Find these segments using the monotone chain algorithm. Since the points are already sorted, it takes O(n) time.

# Lengths of the segments in the upper convex hull
# of the cumulative sum graph
def upperSumHullLengths(arr):
    if len(arr) < 2:
        if len(arr) < 1:
            return []
        else:
            return [1]
    
    hull = [(0, 0),(1, arr[0])]
    for x in range(2, len(arr)+1):
        # this has x coordinate x-1
        prevPoint = hull[len(hull) - 1]
        # next point in cumulative sum
        point = (x, prevPoint[1] + arr[x-1])
        # remove points not on the convex hull
        while len(hull) >= 2:
            p0 = hull[len(hull)-2]
            dx0 = prevPoint[0] - p0[0]
            dy0 = prevPoint[1] - p0[1]
            dx1 = x - prevPoint[0]
            dy1 = point[1] - prevPoint[1]
            if dy1*dx0 < dy0*dx1:
                break
            hull.pop()
            prevPoint = p0
        hull.append(point)
    
    return [hull[i+1][0] - hull[i][0] for i in range(0, len(hull)-1)]


print(upperSumHullLengths([  1,   7,   8,   4,   2,   1,   4]))

prints:

[3, 1, 3]

Repeatedly removing the maximum average subarray

Tags:

python

arrays

algorithm

data-structures

kcsquared

1 Answers

Matt Timmermans

Recent Activity

Donate For Us

Repeatedly removing the maximum average subarray

Tags:

python

arrays

algorithm

data-structures

kcsquared

1 Answers

Matt Timmermans

Related questions

Recent Activity

Donate For Us