The result when powering a pandas integer Series seems wrong.
# Standard Python
42**42
# 150130937545296572356771972164254457814047970568738777235893533016064
# Pandas series, float dtype
s = pd.Series([12, 42], index=range(2), dtype=float)
s**42
# 0 2.116471e+45
# 1 1.501309e+68
# dtype: float64
# Pandas series, integer dtype
s = pd.Series([12, 42], index=range(2), dtype=int)
s**42
# 0 0
# 1 4121466560160202752
# dtype: int64
How come?
Python numbers have an arbitrary precision. Pandas integer columns are backed by numpy int64 numbers, which overflow after 9223372036854775807:
import numpy as np
np.array([12, 42])**42
# array([ 0, 4121466560160202752])
Your number is just too big to represent as integer (in pandas/numpy).
NB. Floating point values, as their name indicate, have a floating precision. They can represent large values (11 bits for the exponent = 2**1023).
To give you a visual representation, here is a graph of x**10 for the first 200 integers, you can clearly see the effect of the overflow after ~78 (it looks random, but it isn't, the values circle back to negative, then positive):
import numpy as np
import matplotlib.pyplot as plt
plt.plot(np.arange(1, 200)**10)

Since Python 3.0, there is no such thing as a max value for int [1]:
The sys.maxint constant was removed, since there is no longer a limit to the value of integers.
By default, Python uses int type for integers, so the result is correct with 42**42.
But with NumPy, which Pandas uses, there is int64, which does have an upper bound limit, according to their docs, which is 9223372036854775807 [2].
Source
[1] https://docs.python.org/3/whatsnew/3.0.html#integers;
[2] https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.int64.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With