Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the fastest, most efficient, and pythonic way to perform a mathematical sigma sum?

Let's say that I want to perform a mathematical summation, say the Madhava–Leibniz formula for π, in Python:

                                                               generated by latex.codecogs.com/gif.latex?%5Cfrac%7B%5Cpi%7D%7B4%7D%5Capprox%20%5Csum_%7Bk%3D0%7D%5E%7Bn%7D%20%5Cfrac%7B%28-1%29%5Ek%7D%7B2k+1%7D

Within a function called Leibniz_pi(), I could create a loop to calculate the nth partial sum, such as:

def Leibniz_pi(n):
    nth_partial_sum = 0       #initialize the variable
    for i in range(n+1):
        nth_partial_sum += ((-1)**i)/(2*i + 1)
    return nth_partial_sum

I'm assuming it would be faster to use something like xrange() instead of range(). Would it be even faster to use numpy and its built in numpy.sum() method? What would such an example look like?

like image 562
Oatmeal Avatar asked Oct 24 '25 15:10

Oatmeal


2 Answers

I guess most people will define the fastest solution by @zero using only numpy as the most pythonic, but it is certainly not the fastest. With some additional optimizations you can beat the already fast numpy implementation by a factor of 50.

Using only Numpy (@zero)

import numpy as np
import numexpr as ne
import numba as nb

def Leibniz_point(n):
    val = (-1)**n / (2*n + 1)
    return val

%timeit Leibniz_point(np.arange(1000)).sum()
33.8 µs ± 203 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Make use of numexpr

n=np.arange(1000)
%timeit ne.evaluate("sum((-1)**n / (2*n + 1))")
21 µs ± 354 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Compile your function using Numba

# with error_model="numpy", turns off division-by-zero checks 
@nb.njit(error_model="numpy",cache=True)
def Leibniz_pi(n):
  nth_partial_sum = 0.       #initialize the variable as float64
  for i in range(n+1):
      nth_partial_sum += ((-1)**i)/(2*i + 1)
  return nth_partial_sum

%timeit Leibniz_pi(999)
6.48 µs ± 38.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Edit, optimizing away the costly (-1)**n

import numba as nb
import numpy as np

#replacement for the much more costly (-1)**n
@nb.njit()
def sgn(i):
    if i%2>0:
        return -1.
    else:
        return 1.

# with error_model="numpy", turns off the division-by-zero checks
#
# fastmath=True makes SIMD-vectorization in this case possible
# floating point math is in general not commutative
# e.g. calculating four times sgn(i)/(2*i + 1) at once and then the sum 
# is not exactly the same as doing this sequentially, therefore you have to
# explicitly allow the compiler to make the optimizations

@nb.njit(fastmath=True,error_model="numpy",cache=True)
def Leibniz_pi(n):
    nth_partial_sum = 0.       #initialize the variable
    for i in range(n+1):
        nth_partial_sum += sgn(i)/(2*i + 1)
    return nth_partial_sum

%timeit Leibniz_pi(999)
777 ns ± 5.36 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
like image 191
max9111 Avatar answered Oct 27 '25 03:10

max9111


3 suggestions (with speed computation):

define the Leibniz point not the cumulative sum:

def Leibniz_point(n):
    val = (-1)**n / (2*n + 1)
    return val

1) sum a list comprehension

%timeit sum([Leibniz_point(n) for n in range(100)])
58.8 µs ± 825 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit sum([Leibniz_point(n) for n in range(1000)])
667 µs ± 3.41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

2) standard for loop

%%timeit
sum = 0
for n in range(100):
    sum += Leibniz_point(n)
61.8 µs ± 4.45 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
sum = 0
for n in range(1000):
    sum += Leibniz_point(n)
    729 µs ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

3) use a numpy array (suggested)

%timeit Leibniz_point(np.arange(100)).sum()
11.5 µs ± 866 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit Leibniz_point(np.arange(1000)).sum()
61.8 µs ± 3.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
like image 45
zero Avatar answered Oct 27 '25 05:10

zero