Consider this array:
a = np.array([1,2,3,4,3,2,1])
I want to get the element that divides the array evenly, i.e the sum of array before the element equals the sum of the array after it. In this case, the 4th element a[3] divides the array evenly. Is there a faster (numpy) way to do that? Or do I have to iterate over all elements?
desired function:
f(a) = 3
If all input values are nonnegative, it seems likely that one of the most efficient ways to do this would be to build a cumulative sum array, then binary search it for the location with half the sum on either side. However, it's also really easy to get such a binary search wrong. While trying to get a binary search working on all edge cases, I wound up with the following tests:
class SplitpointTest(unittest.TestCase):
def testFloatRounding(self):
# Due to rounding error, the cumulative sums for these inputs are
# [1.1, 3.3000000000000003, 3.3000000000000003, 5.5, 6.6]
# and [0.1, 0.7999999999999999, 0.7999999999999999, 1.5, 1.6]
# Note that under default settings, numpy won't display
# enough precision to see that.
self.assertEquals(2, splitpoint([1.1, 2.2, 1e-20, 2.2, 1.1]))
self.assertEquals(2, splitpoint([0.1, 0.7, 1e-20, 0.7, 0.1]))
def testIntRounding(self):
self.assertEquals(1, splitpoint([1, 1, 1]))
def testIntPrecision(self):
self.assertEquals(2, splitpoint([2**60, 1, 1, 1, 2**60]))
def testIntMax(self):
self.assertEquals(
2,
splitpoint(numpy.array([40, 23, 1, 63], dtype=numpy.int8))
)
def testIntZeros(self):
self.assertEquals(
4,
splitpoint(numpy.array([0, 1, 0, 2, 0, 2, 0, 1], dtype=int))
)
def testFloatZeros(self):
self.assertEquals(
4,
splitpoint(numpy.array([0, 1, 0, 2, 0, 2, 0, 1], dtype=float))
)
And I went through the following versions before deciding it wasn't worth it:
def splitpoint(a):
c = numpy.cumsum(a)
return numpy.searchsorted(c, c[-1]/2)
# Fails on [1, 1, 1]
def splitpoint(a):
c = numpy.cumsum(a)
return numpy.searchsorted(c, c[-1]/2.0)
# Fails on [2**60, 1, 1, 1, 2**60]
def splitpoint(a):
c = numpy.cumsum(a)
if c.dtype.kind == 'f':
# Floating-point input.
return numpy.searchsorted(c, c[-1]/2.0)
elif c.dtype.kind in ('i', 'u'):
# Integer input.
return numpy.searchsorted(c, (c[-1]+1)//2)
else:
# Probably an object dtype. No great options.
return numpy.searchsorted(c, c[-1]/2.0)
# Fails on numpy.array([63, 1, 63], dtype=int8)
def splitpoint(a):
c = numpy.cumsum(a)
if c.dtype.kind == 'f':
# Floating-point input.
return numpy.searchsorted(c, c[-1]/2.0)
elif c.dtype.kind in ('i', 'u'):
# Integer input.
return numpy.searchsorted(c, c[-1]//2 + c[-1]%2)
else:
# Probably an object dtype. No great options.
return numpy.searchsorted(c, c[-1]/2.0)
# Still fails the floating-point rounding and zeros tests.
I could probably get this working if I kept trying, but it wouldn't be worth it. chw21's second solution, the one based on explicitly minimizing the absolute difference between the left and right sums, is much easier to reason about and more generally applicable. With the addition of a = numpy.asarray(a), it passes all above test cases and also the following tests, which expand the kind of inputs the algorithm is expected to work on:
class SplitpointGeneralizedTest(unittest.TestCase):
def testNegatives(self):
self.assertEquals(2, splitpoint([-1, 5, 2, 4]))
def testComplex(self):
self.assertEquals(2, splitpoint([1+1j, -5+2j, 43, -4+3j]))
def testObjectDtype(self):
from fractions import Fraction
from decimal import Decimal
self.assertEquals(2, splitpoint(map(Fraction, [1.5, 2.5, 3.5, 4])))
self.assertEquals(2, splitpoint(map(Decimal, [1.5, 2.5, 3.5, 4])))
Unless it is specifically found to be too slow, I would go with chw21's second solution. In the slightly modified form in which I tested it, that would be the following:
def splitpoint(a):
a = np.asarray(a)
c1 = a.cumsum()
c2 = a[::-1].cumsum()[::-1]
return np.argmin(np.abs(c1-c2))
The only flaw I can see is that if the input has an unsigned dtype and there is no index that exactly splits the input, this algorithm might not return the index that is closest to splitting the input, since np.abs(c1-c2) doesn't do the right thing for unsigned data types. It was never specified what the algorithm should do if there is no splitting index, so this behavior is acceptable, though it might be worth making a note about np.abs(c1-c2) and unsigned dtypes in a comment. If we want the index closest to splitting the input, we can get it at the cost of some extra runtime:
def splitpoint(a):
a = np.asarray(a)
c1 = a.cumsum()
c2 = a[::-1].cumsum()[::-1]
if a.dtype.kind == 'u':
# np.abs(c1-c2) doesn't work on unsigned ints
absdiffs = np.where(c1>c2, c1-c2, c2-c1)
else:
# c1>c2 doesn't work on complex input.
# We also use this case for other dtypes, since it's
# probably faster.
absdiffs = np.abs(c1-c2)
return np.argmin(absdiffs)
And of course, here's the test for this behavior, which the modified form passes and the unmodified form fails:
class SplitpointUnsignedTest(unittest.TestCase):
def testBestApproximation(self):
self.assertEquals(1, splitpoint(numpy.array([5, 5, 4, 5], dtype=numpy.uint32)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With