I am trying to replace nan with None in a pandas dataframe MultiIndex. It seems like None is converted to nan in MultiIndex (but not in other index types).
Following does not work (Taken from the question Replace NaN in DataFrame index)
df = pd.DataFrame([['a', True, 1], ['b', True, 2], ['c', False, 3], ['d', None, 4]], columns=['c1', 'c2', 'c3'])
df.set_index(['c1','c2'], inplace=True)
df.index = pd.MultiIndex.from_frame(df.index.to_frame().fillna(np.nan).replace([np.nan], [None]))
df
c3
c1 c2
a True 1
b True 2
c False 3
d NaN 4
type(df.index[3][1])
<class 'float'>
Neither does
index_tuples = [tuple(row) for row in df.index.to_frame().fillna(np.nan).replace([np.nan], [None]).values]
pd.MultiIndex.from_tuples(index_tuples)
MultiIndex([('a', True),
('b', True),
('c', False),
('d', nan)],
)
type(df.index[3][1])
<class 'float'>
It seems None is converted to NaN in MultiIndex.
PS. It works for other index types:
df = pd.DataFrame([['a', True, 1], ['b', True, 2], ['c', False, 3], ['d', None, 4]], columns=['c1', 'c2', 'c3'])
df.set_index('c2', inplace=True)
>>> df
c1 c3
c2
True a 1
True b 2
False c 3
NaN d 4
>>> df.index = df.index.fillna(value=np.nan).to_series().replace([np.nan], [None])
>>> df
c1 c3
c2
True a 1
True b 2
False c 3
NaN d 4
>>> type(df.index[3])
<class 'NoneType'>
>>>
The only way I managed to do it is by manipulating the numpy array directly. Seems like any assignment of None values by a MultiIndex in pandas results in conversion to NaN
import pandas as pd
import numpy as np
df = pd.DataFrame([['a', True, 1], ['b', True, 2], ['c', False, 3], ['d', None, 4]], columns=['c1', 'c2', 'c3'])
df.set_index(['c1','c2'], inplace=True)
def replace_nan(x):
new_x = []
for v in x:
try:
if np.isnan(v):
new_x.append(None)
else:
new_x.append(v)
except TypeError:
new_x.append(v)
return tuple(new_x)
print('Before:\n', df.index)
idx = df.index.values
idx[:] = np.vectorize(replace_nan, otypes=['object'])(idx) # Replace values in np.array
print('After:\n', df.index)
Result:
Before:
MultiIndex([('a', True),
('b', True),
('c', False),
('d', nan)],
names=['c1', 'c2'])
After:
MultiIndex([('a', True),
('b', True),
('c', False),
('d', None)],
names=['c1', 'c2'])
I think there could actually be a bug in Pandas here. But the following works for me in a similar situation:
df = df.set_index(pf.MultiIndex.from_product(df.index.levels))
The bug is that the df.index.levels is the same, with or without expanded indices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With