I had this working, and now I changed "something" and it doesn't work any more. I have a table in Excel that I'm reading into a df. That works as expected. I read it in, and then I filter it so I only have the rows of data that match a specific criteria. Here's the code:
df = excel_range_upper_left.options(pd.DataFrame, expand='table', header=1,
index=False).value
print(f'Check for missings')
for c in df.columns:
count = df[c].isnull().sum()
print(f'Col {c} has {count} missing values')
print(f'Done checking for missings')
df = df[df["Phase"].str.contains('2')] #, na=False)]
The for loop cycles through the df columns looking for missing values. They all come back as 0 missing values.
Then I execute the last statement, which should (I think) simply filter the df into a new df (of the same name) that has to have a string value '2' in the 'Phase' column. And that where it errors out with the Cannot mask...
error.
My guess is that there is still a missing value somewhere in the table, but I've exhausted my ability to find it (I also tried the following - but it results in the same thing as the original df and still errors out:
print(f'{exer_df.dropna(axis=1)}')
If I change the filter line to the following so it includes the na=False
option, everything works. But I don't know why...
exer_df = exer_df[exer_df["Phase"].str.contains('2', na=False)]
Anyone have any insight and help?
For completeness, here are the results from the for loop and then the error:
Check for missings
Col Type as 0 missing values
Col Sub-Type as 0 missing values
Col Phase as 0 missing values
Col Body Parts as 0 missing values
Col Sets as 0 missing values
Col Reps as 0 missing values
Col Tempo as 0 missing values
Col Intensity as 0 missing values
Col Rest Period as 0 missing values
Col Exercise as 0 missing values
Col Prim Mover as 0 missing values
Done checking for miossings
Traceback (most recent call last):
File "c:/Users/kirby/OneDrive/Beating Old Age Book/Blog Page Text/nutrition/calculations/plan_creator.py", line 271, in <module>
mcp()
File "c:/Users/kirby/OneDrive/Beating Old Age Book/Blog Page Text/nutrition/calculations/plan_creator.py", line 188, in mcp
exer_df = exer_df[exer_df["Phase"].str.contains('2')] #, na=False)]
File "C:\Users\kirby\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\frame.py", line 3014, in __getitem__
if com.is_bool_indexer(key):
File "C:\Users\kirby\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\common.py", line 114, in is_bool_indexer
raise ValueError(na_msg)
ValueError: Cannot mask with non-boolean array containing NA / NaN values
As the original poster already stated, with the parameter na=False
everything should be working fine.
If you try to use Series.str.contains
in imported Excel DataFrames with blank values a ValueError raises: "Cannot mask with non-boolean array containing NA / NaN values"
Specifying na
to be False
(na=False
), replaces NaN values with False values and avoids this error; an alternative solution would also be Series.fillna(False)
before calling the function.
Somehow using other filling values does not work in this particular scenario, it needs to be a Bool.
I just got out of something similar (the very same text of the error) by doing this:
df.dropna()[df.dropna()['Label'].str.contains('target')]
So the moral may be that you'll need to drop NaNs inside your brackets, too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With