Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

-9999 as missing value with numpy.genfromtxt()

Lets say I have a dumb text file with the contents:

Year    Recon   Observed
1505    162.38        23      
1506     46.14     -9999      
1507    147.49     -9999      

-9999 is used to denote a missing value (don't ask).

So, I should be able to read this into a Numpy array with:

import numpy as np
x = np.genfromtxt("file.txt", dtype = None, names = True, missing_values = -9999)

And have all my little -9999s turn into numpy.nan. But, I get:

>>> x
array([(1409, 112.38, 23), (1410, 56.14, -9999), (1411, 145.49, -9999)], 
  dtype=[('Year', '<i8'), ('Recon', '<f8'), ('Observed', '<i8')])

... That's not right...

Am I missing something?

like image 839
brews Avatar asked Sep 14 '25 21:09

brews


1 Answers

Nope, you're not doing anything wrong. Using the missing_values argument indeed tells np.genfromtxt that the corresponding values should be flagged as "missing/invalid". The problem is that dealing with missing values is only supported if you use the usemask=True argument (I probably should have made that clearer in the documentation, my bad).

With usemask=True, the output is a masked array. You can transform it into a regular ndarray with the missing values replaced by np.nan with the method .filled(np.nan).

Be careful, though: if you have column that was detected as having a int dtype and you try to fill its missing values with np.nan, you won't get what you expect (np.nan is only supported for float columns).

like image 63
Pierre GM Avatar answered Sep 17 '25 11:09

Pierre GM