Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.loc indexing changes type

If I have a pandas.DataFrame with columns of different type (e.g. int64 and float64), getting a single element from the int column with .loc indexing converts the output to float:

import pandas as pd
df_test = pd.DataFrame({'ints':[1,2,3], 'floats': [4.5,5.5,6.5]})

df_test['ints'].dtype
>>> dtype('int64')

df_test.loc[0,'ints']
>>> 1.0

type(df_test.loc[0,'ints'])
>>> numpy.float64

If I use .at for indexing, it doesn't happen:

type(df_test.at[0,'ints'])
>>> numpy.int64

It also doesn't happen when all the columns are int:

df_test = pd.DataFrame({'ints':[1,2,3], 'ints2': [4,5,6]})
df_test.loc[0,'ints']
>>> 1

Is this a consequence of some core properties of pandas indexing? In other words, is it a bug of a feature? :)

Update: Turns out, it is a bug and it is going to be fixed in pandas 0.20.0.

like image 331
Sergey Antopolskiy Avatar asked Sep 19 '25 05:09

Sergey Antopolskiy


1 Answers

The issue here is that loc is implicitly trying to return a Series initially even though you're returning a single column and hence a scalar value from that row the dtype is being upcasted to a dtype that will support all dtypes for that row, if you selected just that column and use loc then it wouldn't convert this:

In [83]:
df_test['ints'].loc[0]

Out[83]:
1

You can see what happens when you don't sub-select:

In [84]:
df_test.loc[0]

Out[84]:
floats    4.5
ints      1.0
Name: 0, dtype: float64

This maybe undesirable and I think there maybe a github issue regarding this

this issue is kinda related

like image 121
EdChum Avatar answered Sep 22 '25 03:09

EdChum