Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting Cannot mask with non-boolean array containing NA / NaN values... But there aren't any missing values

I had this working, and now I changed "something" and it doesn't work any more. I have a table in Excel that I'm reading into a df. That works as expected. I read it in, and then I filter it so I only have the rows of data that match a specific criteria. Here's the code:

    df = excel_range_upper_left.options(pd.DataFrame, expand='table', header=1,
            index=False).value

    print(f'Check for missings')
    for c in df.columns:
        count = df[c].isnull().sum()
        print(f'Col {c} has {count} missing values')
    print(f'Done checking for missings')

    df = df[df["Phase"].str.contains('2')]  #, na=False)]

The for loop cycles through the df columns looking for missing values. They all come back as 0 missing values.

Then I execute the last statement, which should (I think) simply filter the df into a new df (of the same name) that has to have a string value '2' in the 'Phase' column. And that where it errors out with the Cannot mask... error.

My guess is that there is still a missing value somewhere in the table, but I've exhausted my ability to find it (I also tried the following - but it results in the same thing as the original df and still errors out:

    print(f'{exer_df.dropna(axis=1)}')

If I change the filter line to the following so it includes the na=False option, everything works. But I don't know why...

exer_df = exer_df[exer_df["Phase"].str.contains('2', na=False)]

Anyone have any insight and help?

For completeness, here are the results from the for loop and then the error:

Check for missings
Col Type as 0 missing values
Col Sub-Type as 0 missing values
Col Phase as 0 missing values
Col Body Parts as 0 missing values
Col Sets as 0 missing values
Col Reps as 0 missing values
Col Tempo as 0 missing values
Col Intensity as 0 missing values
Col Rest Period as 0 missing values
Col Exercise as 0 missing values
Col Prim Mover as 0 missing values
Done checking for miossings
Traceback (most recent call last):
  File "c:/Users/kirby/OneDrive/Beating Old Age Book/Blog Page Text/nutrition/calculations/plan_creator.py", line 271, in <module>
    mcp()
  File "c:/Users/kirby/OneDrive/Beating Old Age Book/Blog Page Text/nutrition/calculations/plan_creator.py", line 188, in mcp
    exer_df = exer_df[exer_df["Phase"].str.contains('2')]  #, na=False)]
  File "C:\Users\kirby\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\frame.py", line 3014, in __getitem__
    if com.is_bool_indexer(key):
  File "C:\Users\kirby\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\common.py", line 114, in is_bool_indexer
    raise ValueError(na_msg)
ValueError: Cannot mask with non-boolean array containing NA / NaN values
like image 743
K Chapman Avatar asked Aug 30 '25 15:08

K Chapman


2 Answers

As the original poster already stated, with the parameter na=False everything should be working fine.

If you try to use Series.str.contains in imported Excel DataFrames with blank values a ValueError raises: "Cannot mask with non-boolean array containing NA / NaN values"

Specifying na to be False (na=False), replaces NaN values with False values and avoids this error; an alternative solution would also be Series.fillna(False) before calling the function.

Somehow using other filling values does not work in this particular scenario, it needs to be a Bool.

like image 89
marinovik Avatar answered Oct 14 '25 01:10

marinovik


I just got out of something similar (the very same text of the error) by doing this:

df.dropna()[df.dropna()['Label'].str.contains('target')]

So the moral may be that you'll need to drop NaNs inside your brackets, too.

like image 26
Brett B Avatar answered Oct 14 '25 03:10

Brett B