I am trying to fill missing values in subset of rows. I am using inplace=True in fillna(), but it is not working in jupyter notebook. You can see attached picture showing NaN in the first 2 rows in column of Surface. I am not sure why?
I have to do this so it is working. why? Thank you for your help.
data.loc[mark,'Surface']=data.loc[mark,'Surface'].fillna(value='TEST')
Here are my codes
mark=(data['Pad']==51) | (data['Pad']==52) | (data['Pad']==53) | (data['Pad']==54) | (data['Pad']==55)
data.loc[mark,'Surface'].fillna(value='TEST',inplace=True)
This one is working:
data.loc[mark,'Surface']=data.loc[mark,'Surface'].fillna(value='TEST')

The main issue you're bumping into here is that pandas does not have very explicit view vs copy rules. Your result indicates to me that the issue here is .loc is returning a copy instead of a view. While pandas does try to return a view from .loc, there are a decent number of caveats.
After playing around a little, it seems that using a boolean/positional index mask return a copy- you can verify this with the private _is_view attribute:
import pandas as pd
import numpy as np
df = pd.DataFrame({"Pad": range(40, 60), "Surface": np.nan})
print(df)
Pad Surface
0 40 NaN
1 41 NaN
2 42 NaN
. ... ...
19 59 NaN
# Create masks
bool_mask = df["Pad"].isin(range(51, 56))
positional_mask = np.where(bool_mask)[0]
# Check `_is_view` after simple .loc:
>>> df.loc[bool_mask, "Surface"]._is_view
False
>>> df.loc[positional_mask, "Surface"]._is_view
False
So neither of the approaches above return a "view" of the original data, which is why performing an inplace operation does not change the original dataframe. In order to return a view from .loc you will need to use a slice as your row-index.
>>> df.loc[10:15, "Surface"]._is_view
True
Now this still won't resolve your issue because the value you're filling NaN with may or may not change the dtype of the "Surface" column. In the example I have set up, "Surface" has a float64 dtype- and by filling in NaN with the value "Test", you are forcing that dtype to change which is incompatible with the original dataframe. If your "Surface" columns is an object dtype, then you don't need to worry about this.
>>> df.dtypes
Pad int64
Surface float64
# this does not work because "Test" is incompatible with float64 dtype
>>> df.loc[10:15, "Surface"].fillna("Test", inplace=True)
# this works because 0.9 is an appropriate value for a float64 dtype
>>> df.loc[10:15, "Surface"].fillna(0.9, inplace=True)
>>> print(df)
Pad Surface
.. ... ...
8 48 NaN
9 49 NaN
10 50 0.9
11 51 0.9
12 52 0.9
13 53 0.9
14 54 0.9
15 55 0.9
16 56 NaN
17 57 NaN
.. ... ...
TLDR; don't rely on inplace in pandas in general. In the bulk of its operations it still creates a copy of the underlying data and then attempts to replace the original source with the new copy. Pandas is not memory efficient so if you're worried about memory-performance you may want to switch to something designed to be zero-copy from the ground up like Vaex, instead of trying to go through pandas.
Your approach of assigning the slice of the dataframe is the most appropriate and will ensure you receive the correct result of updating the dataframe as "inplace" as possible:
>>> df.loc[bool_mask, "Surface"] = df.loc[bool_mask, "Surface"].fillna("Test")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With