I have an array of positive integers. For example:
[1, 7, 8, 4, 2, 1, 4]
A "reduction operation" finds the array prefix with the highest average, and deletes it. Here, an array prefix means a contiguous subarray whose left end is the start of the array, such as [1]
or [1, 7]
or [1, 7, 8]
above. Ties are broken by taking the longer prefix.
Original array: [ 1, 7, 8, 4, 2, 1, 4]
Prefix averages: [1.0, 4.0, 5.3, 5.0, 4.4, 3.8, 3.9]
-> Delete [1, 7, 8], with maximum average 5.3
-> New array -> [4, 2, 1, 4]
I will repeat the reduction operation until the array is empty:
[1, 7, 8, 4, 2, 1, 4]
^ ^
[4, 2, 1, 4]
^ ^
[2, 1, 4]
^ ^
[]
Now, actually performing these array modifications isn't necessary; I'm only looking for the list of lengths of prefixes that would be deleted by this process, for example, [3, 1, 3]
above.
What is an efficient algorithm for computing these prefix lengths?
The naive approach is to recompute all sums and averages from scratch in every iteration for an O(n^2)
algorithm-- I've attached Python code for this below. I'm looking for any improvement on this approach-- most preferably, any solution below O(n^2)
, but an algorithm with the same complexity but better constant factors would also be helpful.
Here are a few of the things I've tried (without success):
O(log n)
time, I haven't found any data structure which can update the average, as the denominator in the average is changing.5
may have a larger average than the prefix ending at index 6
, but after removing the first 3 elements, now the prefix ending at index 2
may have a smaller average than the one ending at 3
.This is a working Python implementation of the naive, quadratic method:
from fractions import Fraction
def find_array_reductions(nums: List[int]) -> List[int]:
"""Return list of lengths of max average prefix reductions."""
def max_prefix_avg(arr: List[int]) -> Tuple[float, int]:
"""Return value and length of max average prefix in arr."""
if len(arr) == 0:
return (-math.inf, 0)
best_length = 1
best_average = Fraction(0, 1)
running_sum = 0
for i, x in enumerate(arr, 1):
running_sum += x
new_average = Fraction(running_sum, i)
if new_average >= best_average:
best_average = new_average
best_length = i
return (float(best_average), best_length)
removed_lengths = []
total_removed = 0
while total_removed < len(nums):
_, new_removal = max_prefix_avg(nums[total_removed:])
removed_lengths.append(new_removal)
total_removed += new_removal
return removed_lengths
Edit: The originally published code had a rare error with large inputs from using Python's math.isclose()
with default parameters for floating point comparison, rather than proper fraction comparison. This has been fixed in the current code. An example of the error can be found at this Try it online link, along with a foreword explaining exactly what causes this bug, if you're curious.
This problem has a fun O(n) solution.
If you draw a graph of cumulative sum vs index, then:
The average value in the subarray between any two indexes is the slope of the line between those points on the graph.
The first highest-average-prefix will end at the point that makes the highest angle from 0. The next highest-average-prefix must then have a smaller average, and it will end at the point that makes the highest angle from the first ending. Continuing to the end of the array, we find that...
These segments of highest average are exactly the segments in the upper convex hull of the cumulative sum graph.
Find these segments using the monotone chain algorithm. Since the points are already sorted, it takes O(n) time.
# Lengths of the segments in the upper convex hull
# of the cumulative sum graph
def upperSumHullLengths(arr):
if len(arr) < 2:
if len(arr) < 1:
return []
else:
return [1]
hull = [(0, 0),(1, arr[0])]
for x in range(2, len(arr)+1):
# this has x coordinate x-1
prevPoint = hull[len(hull) - 1]
# next point in cumulative sum
point = (x, prevPoint[1] + arr[x-1])
# remove points not on the convex hull
while len(hull) >= 2:
p0 = hull[len(hull)-2]
dx0 = prevPoint[0] - p0[0]
dy0 = prevPoint[1] - p0[1]
dx1 = x - prevPoint[0]
dy1 = point[1] - prevPoint[1]
if dy1*dx0 < dy0*dx1:
break
hull.pop()
prevPoint = p0
hull.append(point)
return [hull[i+1][0] - hull[i][0] for i in range(0, len(hull)-1)]
print(upperSumHullLengths([ 1, 7, 8, 4, 2, 1, 4]))
prints:
[3, 1, 3]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With