Replacing nan with None in pandas dataframe MultiIndex

Question

I am trying to replace nan with None in a pandas dataframe MultiIndex. It seems like None is converted to nan in MultiIndex (but not in other index types).

Following does not work (Taken from the question Replace NaN in DataFrame index)

df = pd.DataFrame([['a', True, 1], ['b', True, 2], ['c', False, 3], ['d', None, 4]], columns=['c1', 'c2', 'c3'])
df.set_index(['c1','c2'], inplace=True)
df.index = pd.MultiIndex.from_frame(df.index.to_frame().fillna(np.nan).replace([np.nan], [None]))
df
          c3
c1 c2       
a  True    1
b  True    2
c  False   3
d  NaN     4
type(df.index[3][1])
<class 'float'>

Neither does

index_tuples = [tuple(row) for row in df.index.to_frame().fillna(np.nan).replace([np.nan], [None]).values]
pd.MultiIndex.from_tuples(index_tuples)
MultiIndex([('a',  True),
            ('b',  True),
            ('c', False),
            ('d',   nan)],
           )

type(df.index[3][1])
<class 'float'>

It seems None is converted to NaN in MultiIndex.

PS. It works for other index types:

df = pd.DataFrame([['a', True, 1], ['b', True, 2], ['c', False, 3], ['d', None, 4]], columns=['c1', 'c2', 'c3'])
df.set_index('c2', inplace=True)
>>> df
      c1  c3
c2          
True   a   1
True   b   2
False  c   3
NaN    d   4
>>> df.index = df.index.fillna(value=np.nan).to_series().replace([np.nan], [None])
>>> df
      c1  c3
c2          
True   a   1
True   b   2
False  c   3
NaN    d   4
>>> type(df.index[3])
<class 'NoneType'>
>>>

MYousefi · Accepted Answer

The only way I managed to do it is by manipulating the numpy array directly. Seems like any assignment of None values by a MultiIndex in pandas results in conversion to NaN

import pandas as pd
import numpy as np
df = pd.DataFrame([['a', True, 1], ['b', True, 2], ['c', False, 3], ['d', None, 4]], columns=['c1', 'c2', 'c3'])
df.set_index(['c1','c2'], inplace=True)

def replace_nan(x):
    new_x = []
    for v in x:
      try:
        if np.isnan(v):
          new_x.append(None)
        else:
          new_x.append(v)
      except TypeError:
        new_x.append(v)
    return tuple(new_x)


print('Before:
', df.index)
idx = df.index.values
idx[:] = np.vectorize(replace_nan, otypes=['object'])(idx) # Replace values in np.array
print('After:
', df.index)

Result:

Before:
 MultiIndex([('a',  True),
            ('b',  True),
            ('c', False),
            ('d',   nan)],
           names=['c1', 'c2'])
After:
 MultiIndex([('a',  True),
            ('b',  True),
            ('c', False),
            ('d',  None)],
           names=['c1', 'c2'])

Charlie Clark · Answer

I think there could actually be a bug in Pandas here. But the following works for me in a similar situation:

df = df.set_index(pf.MultiIndex.from_product(df.index.levels))

The bug is that the df.index.levels is the same, with or without expanded indices.

Replacing nan with None in pandas dataframe MultiIndex

Tags:

python

pandas

dataframe

Gofrette

2 Answers

MYousefi

Charlie Clark

Recent Activity

Donate For Us

Replacing nan with None in pandas dataframe MultiIndex

Tags:

python

pandas

dataframe

Gofrette

2 Answers

MYousefi

Charlie Clark

Related questions

Recent Activity

Donate For Us