Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent pandas from reading "NA" as NaN

Tags:

python

pandas

The .csv-file I'm reading from contains cells with the value "NA". Pandas automatically converts these into NaN, which I don't want. I'm aware of the keep_default_na=False parameter, but that changes the dtype of the columns to object which means pd.get_dummies doesn't work correctly.

Is there any way to prevent pandas from reading "NA" as NaN without changing the dtype?

like image 868
krntz Avatar asked Jan 01 '17 16:01

krntz


People also ask

Does Panda read NaN na?

This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.

Why am I getting NaN in pandas?

In applied data science, you will usually have missing data. For example, an industrial application with sensors will have sensor data that is missing on certain days. You have a couple of alternatives to work with missing data.

How do you replace NaN with nothing pandas?

Use df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.


2 Answers

keep_default_na=False works for me

from io import StringIO
import pandas as pd

txt = """col1,col2
a,b
NA,US"""

print(pd.read_csv(StringIO(txt), keep_default_na=False))

  col1 col2
0    a    b
1   NA   US

Without it

print(pd.read_csv(StringIO(txt)))

  col1 col2
0    a    b
1  NaN   US
like image 92
piRSquared Avatar answered Oct 19 '22 03:10

piRSquared


This is what Pandas documentation gives:

na_values : scalar, str, list-like, or dict, optional
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

keep_default_na : bool, default True
Whether or not to include the default NaN values when parsing the data. Depending on whether na_values is passed in, the behavior is as follows:

If keep_default_na is True, and na_values are specified, na_values is appended to the default NaN values used for parsing.
If keep_default_na is True, and na_values are not specified, only the default NaN values are used for parsing.
If keep_default_na is False, and na_values are specified, only the NaN values specified na_values are used for parsing.
If keep_default_na is False, and na_values are not specified, no strings will be parsed as NaN.
Note that if na_filter is passed in as False, the keep_default_na and na_values parameters will be ignored.
like image 3
Koo Avatar answered Oct 19 '22 02:10

Koo