Let's say that I want to perform a mathematical summation, say the Madhava–Leibniz formula for π, in Python:

Within a function called Leibniz_pi(), I could create a loop to calculate the nth partial sum, such as:
def Leibniz_pi(n):
nth_partial_sum = 0 #initialize the variable
for i in range(n+1):
nth_partial_sum += ((-1)**i)/(2*i + 1)
return nth_partial_sum
I'm assuming it would be faster to use something like xrange() instead of range(). Would it be even faster to use numpy and its built in numpy.sum() method? What would such an example look like?
I guess most people will define the fastest solution by @zero using only numpy as the most pythonic, but it is certainly not the fastest. With some additional optimizations you can beat the already fast numpy implementation by a factor of 50.
Using only Numpy (@zero)
import numpy as np
import numexpr as ne
import numba as nb
def Leibniz_point(n):
val = (-1)**n / (2*n + 1)
return val
%timeit Leibniz_point(np.arange(1000)).sum()
33.8 µs ± 203 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Make use of numexpr
n=np.arange(1000)
%timeit ne.evaluate("sum((-1)**n / (2*n + 1))")
21 µs ± 354 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Compile your function using Numba
# with error_model="numpy", turns off division-by-zero checks
@nb.njit(error_model="numpy",cache=True)
def Leibniz_pi(n):
nth_partial_sum = 0. #initialize the variable as float64
for i in range(n+1):
nth_partial_sum += ((-1)**i)/(2*i + 1)
return nth_partial_sum
%timeit Leibniz_pi(999)
6.48 µs ± 38.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Edit, optimizing away the costly (-1)**n
import numba as nb
import numpy as np
#replacement for the much more costly (-1)**n
@nb.njit()
def sgn(i):
if i%2>0:
return -1.
else:
return 1.
# with error_model="numpy", turns off the division-by-zero checks
#
# fastmath=True makes SIMD-vectorization in this case possible
# floating point math is in general not commutative
# e.g. calculating four times sgn(i)/(2*i + 1) at once and then the sum
# is not exactly the same as doing this sequentially, therefore you have to
# explicitly allow the compiler to make the optimizations
@nb.njit(fastmath=True,error_model="numpy",cache=True)
def Leibniz_pi(n):
nth_partial_sum = 0. #initialize the variable
for i in range(n+1):
nth_partial_sum += sgn(i)/(2*i + 1)
return nth_partial_sum
%timeit Leibniz_pi(999)
777 ns ± 5.36 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
3 suggestions (with speed computation):
define the Leibniz point not the cumulative sum:
def Leibniz_point(n):
val = (-1)**n / (2*n + 1)
return val
1) sum a list comprehension
%timeit sum([Leibniz_point(n) for n in range(100)])
58.8 µs ± 825 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit sum([Leibniz_point(n) for n in range(1000)])
667 µs ± 3.41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2) standard for loop
%%timeit
sum = 0
for n in range(100):
sum += Leibniz_point(n)
61.8 µs ± 4.45 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
sum = 0
for n in range(1000):
sum += Leibniz_point(n)
729 µs ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3) use a numpy array (suggested)
%timeit Leibniz_point(np.arange(100)).sum()
11.5 µs ± 866 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit Leibniz_point(np.arange(1000)).sum()
61.8 µs ± 3.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With