I am using Python 3.8 and Pandas 1.3. Here is some sample code:
data_dc = {'Dates': ['10212021','11152021','01142022','02122022']}
df1 = pd.DataFrame(data_dc)
print(df1['Dates'].astype(int))
Results:
0 10212021
1 11152021
2 1142022
3 2122022
Name: Dates, dtype: int32
I specified a Python data type (int) as the argument of the astype method and expected a dtype of the Dates column to be int64. Instead, I got int32. Is this a bug or am I doing something wrong? This is easy to work around, but I like to make sure I understand what to expect from the software.
Pandas uses numpy datatypes under the hood. From the numpy documentation,
The default NumPy behavior is to create arrays in either 32 or 64-bit signed integers (platform dependent and matches C int size) or double precision floating point numbers, int32/int64 and float, respectively. If you expect your integer arrays to be a specific type, then you need to specify the dtype while you create the array.
It is not a bug and you should be specifying dtypes if you have a specific use or want to be platform agnostic. To rephrase your question, what is np.dtype(int) on my platform?
On windows, as some of the comments suggest, it appears to be a C signed long (32 bits). You can even get numpy to throw an overflow error to confirm this.
>>> import numpy as np
>>> np.array([2_147_483_648], dtype=int)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With