Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to clean NaN and Inf in list type data in Python

from math import *

raw_data = [3.1, float('NaN'), 3.14, 3.141, 3.1415, float('Nan'), 3.14159, float('Inf'), float('-Inf'), 3.1415926]

filtered_data = [] 

for v in raw_data:
    if not ((v is float('NaN')) | (v is float('Inf')) | (v is float('-Inf'))): 
        filtered_data.append(v)

print(filtered_data)

print(raw_data[1]) 

print(raw_data[1] is float('NaN'))

I'm trying to remove the NaN, Inf and -Inf values in the list data. The if condition seems to take no effect. raw_data[1] is a NaN. Why then is print(raw_data[1] is float('NaN')) False?

like image 382
chentaocuc Avatar asked Sep 11 '25 15:09

chentaocuc


2 Answers

If you step through your code you can begin to see what is happening

>>> import math 
>>> raw_data = [56.2, float('NaN'), 51.7, 
...     55.3, 52.5, float('Nan'), 47.8, float('Inf'), float('-Inf')]
>>> 
>>> print(raw_data)
[56.2, nan, 51.7, 55.3, 52.5, nan, 47.8, inf, -inf]
>>> filtered_data = [] 
>>> for v in raw_data:
...     if not ((v is float('NaN')) | (v is float('Inf')) | (v is float('-Inf'))):          filtered_data.append(v)
... 
>>> filtered_data
[56.2, nan, 51.7, 55.3, 52.5, nan, 47.8, inf, -inf]

so clearly your attempt to remove the 'NaN's, etc. ain't working. So let's begin at the beginning and determine what went wrong!

>>> raw_data
[56.2, nan, 51.7, 55.3, 52.5, nan, 47.8, inf, -inf]
>>> raw_data[0]
56.2
>>> raw_data[1]
nan
>>> raw_data[1] is float('Nan')
False
>>> raw_data[1] == float('Nan')
False

aha! The test for whether x is a NaN isn't what we expected. Looking for methods in math is a start.

>>> math.
math.acos(      math.cosh(      math.fmod(      math.isnan(     math.pow(      
math.acosh(     math.degrees(   math.frexp(     math.ldexp(     math.radians(  
math.asin(      math.e          math.fsum(      math.lgamma(    math.sin(      
math.asinh(     math.erf(       math.gamma(     math.log(       math.sinh(     
math.atan(      math.erfc(      math.gcd(       math.log10(     math.sqrt(     
math.atan2(     math.exp(       math.hypot(     math.log1p(     math.tan(      
math.atanh(     math.expm1(     math.inf        math.log2(      math.tanh(     
math.ceil(      math.fabs(      math.isclose(   math.modf(      math.tau       
math.copysign(  math.factorial( math.isfinite(  math.nan        math.trunc(    
math.cos(       math.floor(     math.isinf(     math.pi

where we see isnan() as well as isinf(). Let's try it:

>>> math.isnan(raw_data[1])
True

Good. so now we can accurately test. Let's turn to that loop.

>>> filtered_data = [v for v in raw_data if not isnan(v)]
>>> filtered_data
[3.1, 3.14, 3.141, 3.1415, 3.14159, inf, -inf, 3.1415926]

That created a List and assigned it to filtered_data in one step, using a list comprehension, which is more pythonic and more performant for that matter. The for loop is in the [] and assigns each v that passes the filter at the end of the statement, if not isnan(v).

It can take compound conditionals as well:

>>> filtered_data = [v for v in raw_data if not isnan(v) and not isinf(v)]
>>> filtered_data
[3.1, 3.14, 3.141, 3.1415, 3.14159, 3.1415926]

Notice that isinf() took care of positive and negative infinities.

like image 159
Shawn Mehan Avatar answered Sep 13 '25 11:09

Shawn Mehan


Use math.isnan and math.isinf and list comprehensions:

import math


raw_data = [56.2, float('NaN'), 51.7, 55.3, 52.5, float('NaN'), 47.8, float('Inf'), float('-Inf')]

filtered_data = [v for v in raw_data if not (math.isinf(v) or math.isnan(v))]

print(raw_data)
print(filtered_data)

Output:

[56.2, nan, 51.7, 55.3, 52.5, nan, 47.8, inf, -inf]
[56.2, 51.7, 55.3, 52.5, 47.8]

| It is a bitwise operator, you should use the boolean operator or. Boolean operators are short-circuiting but bitwise operators are not short-circuiting.

like image 45
FJSevilla Avatar answered Sep 13 '25 11:09

FJSevilla