If I have a pandas.DataFrame
with columns of different type (e.g. int64
and float64
), getting a single element from the int
column with .loc
indexing converts the output to float
:
import pandas as pd
df_test = pd.DataFrame({'ints':[1,2,3], 'floats': [4.5,5.5,6.5]})
df_test['ints'].dtype
>>> dtype('int64')
df_test.loc[0,'ints']
>>> 1.0
type(df_test.loc[0,'ints'])
>>> numpy.float64
If I use .at
for indexing, it doesn't happen:
type(df_test.at[0,'ints'])
>>> numpy.int64
It also doesn't happen when all the columns are int
:
df_test = pd.DataFrame({'ints':[1,2,3], 'ints2': [4,5,6]})
df_test.loc[0,'ints']
>>> 1
Is this a consequence of some core properties of pandas
indexing? In other words, is it a bug of a feature? :)
Update: Turns out, it is a bug and it is going to be fixed in pandas 0.20.0
.
The issue here is that loc
is implicitly trying to return a Series
initially even though you're returning a single column and hence a scalar value from that row the dtype
is being upcasted to a dtype that will support all dtypes for that row, if you selected just that column and use loc
then it wouldn't convert this:
In [83]:
df_test['ints'].loc[0]
Out[83]:
1
You can see what happens when you don't sub-select:
In [84]:
df_test.loc[0]
Out[84]:
floats 4.5
ints 1.0
Name: 0, dtype: float64
This maybe undesirable and I think there maybe a github issue regarding this
this issue is kinda related
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With