Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas DataFrame combine_first method converts boolean in floats

I'm running into a strange issue where combine_first method is causing values stored as bool to be upcasted into float64s. Example:

In [1]: import pandas as pd

In [2]: df1 = pd.DataFrame({"a": [True]})

In [3]: df2 = pd.DataFrame({"b": ['test']})

In [4]: df2.combine_first(df1)
Out[4]:
     a     b
0  1.0  test

This problem has already been reported in a previous post 3 years ago: pandas DataFrame combine_first and update methods have strange behavior. This issue was told to be solved but I still have this behaviour under pandas 0.18.1

thank you for your help

like image 315
RomB Avatar asked Oct 29 '25 08:10

RomB


1 Answers

Somewhere along the chain of events to get to a combined dataframe, potential missing values had to be addressed. I'm aware that nothing is missing in your example. None and np.nan are not int, or bool. So in order to have a common dtype that contains a bool and a None or np.nan it is necessary to cast the column as either object or float. As 'float`, a large number of operations become far more efficient and is a decent choice. It obviously isn't the best choice all of the time, but a choice has to be made none the less and pandas tried to infer the best one.

A work around:

Setup

df1 = pd.DataFrame({"a": [True]})
df2 = pd.DataFrame({"b": ['test']})

df3 = df2.combine_first(df1)
df3

enter image description here

Solution

dtypes = df1.dtypes.combine_first(df2.dtypes)

for k, v in dtypes.iteritems():
    df3[k] = df3[k].astype(v)

df3

enter image description here

like image 107
piRSquared Avatar answered Oct 31 '25 02:10

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!