How to replace values when loading data with genfromtxt

Question

I wonder how I can replace specific values when loading data from a given (csv) file with multiple columns, combining both strings and numerical values.

In the example that follows, suppose that you have a number of geographical positions, with known latitudes and longitudes and a specific set of properties (P1-P5) and a class (just to include the string component of the problem). There are some missing values which are properly replaced by genfromtxt (missing value in this case is -999) and there are, additionally, values that are not correct (fake, or other kinds of flags) such as 0.0. How can we replace 0.0 to -999 ?

Data:

Name,lat,long,P1,P2,P3,P4,P5,Class
id1,71.234,10.123,0.0,11,212,222,1920,A
id2,72.234,11.111,,,312,342,1920,A
id3,77.832,12.111,1,0.0,,333,4520,B
id4,77.987,12.345,3,0.0,,231,2020,B
id5,77.111,13.099,5,11,212,222,1920,A

And the code so far:

dfile = "data.csv"
missing_value = -999

import numpy as np

data = np.genfromtxt(dfile, unpack=True, comments='#', names=True, 
                    autostrip='Yes', filling_values=missing_value,
                    dtype=('S5', 'float', 'float', 'float', 'float', 'float', 'float', 'S1')
                    , delimiter=',',
                    )
new_data = np.where(data!=0.0 ,data, -999)

I have used the np.where as in np.where(data!=0.0 ,data, -999) but I got an error:

TypeError: invalid type promotion

I do not know what I am missing...

ps 1. Perhaps it is solvable with pandas but I am looking for an independent solution

ps 2. I know that a dirty workaround would be to set the incorrect values (of 0.0s) as my missing flag in the initial file, but what is there are multiple values that we would like to exclude ? (or combining data with different flags)

hpaulj · Accepted Answer

Define a simple text:

In [55]: txt= '''foo,bar,test 
    ...: a,1,2 
    ...: b,3,4 
    ...: '''

load with genfromtxt:

In [60]: data = np.genfromtxt(txt.splitlines(), encoding=None, names=True, dtype=None, delimiter=',')           
In [61]: data                                                                                                   
Out[61]: 
array([('a', 1, 2), ('b', 3, 4)],
      dtype=[('foo', '<U1'), ('bar', '<i8'), ('test', '<i8')])

Note the dtype - fields with different dtype and names.

Access fields by name:

In [64]: data['foo']                                                                                            
Out[64]: array(['a', 'b'], dtype='<U1')

Modify one field by index:

In [65]: data['bar']                                                                                            
Out[65]: array([1, 3])
In [66]: data['bar'][0] = 23

Modify another with boolean test (or where):

In [67]: test = data['test']                                                                                    
In [68]: test                                                                                                   
Out[68]: array([2, 4])
In [69]: test==2                                                                                                
Out[69]: array([ True, False])
In [70]: test[test==2]=0                                                                                        
In [71]: test                                                                                                   
Out[71]: array([0, 4])
In [72]: data                                                                                                   
Out[72]: 
array([('a', 23, 0), ('b',  3, 4)],
      dtype=[('foo', '<U1'), ('bar', '<i8'), ('test', '<i8')])

Replacement might be easier if you grouped the numeric fields into one (but that requires more understanding of structured array dtypes):

In [80]: data = np.genfromtxt(txt.splitlines(), encoding=None, skip_header=1, dtype=[('id','U3'),('foo',int,2)],
    ...:  delimiter=',')                                                                                        
In [81]: data                                                                                                   
Out[81]: 
array([('a', [1, 2]), ('b', [3, 4])],
      dtype=[('id', '<U3'), ('foo', '<i8', (2,))])
In [82]: data['foo']                                                                                            
Out[82]: 
array([[1, 2],
       [3, 4]])

How to replace values when loading data with genfromtxt

Tags:

python

arrays

numpy

genfromtxt

gmaravel

1 Answers

hpaulj

Recent Activity

Donate For Us

How to replace values when loading data with genfromtxt

Tags:

python

arrays

numpy

genfromtxt

gmaravel

1 Answers

hpaulj

Related questions

Recent Activity

Donate For Us