Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace values when loading data with genfromtxt

I wonder how I can replace specific values when loading data from a given (csv) file with multiple columns, combining both strings and numerical values.

In the example that follows, suppose that you have a number of geographical positions, with known latitudes and longitudes and a specific set of properties (P1-P5) and a class (just to include the string component of the problem). There are some missing values which are properly replaced by genfromtxt (missing value in this case is -999) and there are, additionally, values that are not correct (fake, or other kinds of flags) such as 0.0. How can we replace 0.0 to -999 ?

Data:

Name,lat,long,P1,P2,P3,P4,P5,Class
id1,71.234,10.123,0.0,11,212,222,1920,A
id2,72.234,11.111,,,312,342,1920,A
id3,77.832,12.111,1,0.0,,333,4520,B
id4,77.987,12.345,3,0.0,,231,2020,B
id5,77.111,13.099,5,11,212,222,1920,A

And the code so far:

dfile = "data.csv"
missing_value = -999

import numpy as np

data = np.genfromtxt(dfile, unpack=True, comments='#', names=True, 
                    autostrip='Yes', filling_values=missing_value,
                    dtype=('S5', 'float', 'float', 'float', 'float', 'float', 'float', 'S1')
                    , delimiter=',',
                    )
new_data = np.where(data!=0.0 ,data, -999)

I have used the np.where as in np.where(data!=0.0 ,data, -999) but I got an error:

TypeError: invalid type promotion

I do not know what I am missing...

ps 1. Perhaps it is solvable with pandas but I am looking for an independent solution

ps 2. I know that a dirty workaround would be to set the incorrect values (of 0.0s) as my missing flag in the initial file, but what is there are multiple values that we would like to exclude ? (or combining data with different flags)

like image 406
gmaravel Avatar asked Dec 06 '25 08:12

gmaravel


1 Answers

Define a simple text:

In [55]: txt= '''foo,bar,test 
    ...: a,1,2 
    ...: b,3,4 
    ...: ''' 

load with genfromtxt:

In [60]: data = np.genfromtxt(txt.splitlines(), encoding=None, names=True, dtype=None, delimiter=',')           
In [61]: data                                                                                                   
Out[61]: 
array([('a', 1, 2), ('b', 3, 4)],
      dtype=[('foo', '<U1'), ('bar', '<i8'), ('test', '<i8')])

Note the dtype - fields with different dtype and names.

Access fields by name:

In [64]: data['foo']                                                                                            
Out[64]: array(['a', 'b'], dtype='<U1')

Modify one field by index:

In [65]: data['bar']                                                                                            
Out[65]: array([1, 3])
In [66]: data['bar'][0] = 23                                                                                    

Modify another with boolean test (or where):

In [67]: test = data['test']                                                                                    
In [68]: test                                                                                                   
Out[68]: array([2, 4])
In [69]: test==2                                                                                                
Out[69]: array([ True, False])
In [70]: test[test==2]=0                                                                                        
In [71]: test                                                                                                   
Out[71]: array([0, 4])
In [72]: data                                                                                                   
Out[72]: 
array([('a', 23, 0), ('b',  3, 4)],
      dtype=[('foo', '<U1'), ('bar', '<i8'), ('test', '<i8')])

Replacement might be easier if you grouped the numeric fields into one (but that requires more understanding of structured array dtypes):

In [80]: data = np.genfromtxt(txt.splitlines(), encoding=None, skip_header=1, dtype=[('id','U3'),('foo',int,2)],
    ...:  delimiter=',')                                                                                        
In [81]: data                                                                                                   
Out[81]: 
array([('a', [1, 2]), ('b', [3, 4])],
      dtype=[('id', '<U3'), ('foo', '<i8', (2,))])
In [82]: data['foo']                                                                                            
Out[82]: 
array([[1, 2],
       [3, 4]])
like image 162
hpaulj Avatar answered Dec 07 '25 20:12

hpaulj



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!