I wonder how I can replace specific values when loading data from a given (csv) file with multiple columns, combining both strings and numerical values.
In the example that follows, suppose that you have a number of geographical positions, with known latitudes and longitudes and a specific set of properties (P1-P5) and a class (just to include the string component of the problem). There are some missing values which are properly replaced by genfromtxt (missing value in this case is -999) and there are, additionally, values that are not correct (fake, or other kinds of flags) such as 0.0. How can we replace 0.0 to -999 ?
Data:
Name,lat,long,P1,P2,P3,P4,P5,Class
id1,71.234,10.123,0.0,11,212,222,1920,A
id2,72.234,11.111,,,312,342,1920,A
id3,77.832,12.111,1,0.0,,333,4520,B
id4,77.987,12.345,3,0.0,,231,2020,B
id5,77.111,13.099,5,11,212,222,1920,A
And the code so far:
dfile = "data.csv"
missing_value = -999
import numpy as np
data = np.genfromtxt(dfile, unpack=True, comments='#', names=True,
autostrip='Yes', filling_values=missing_value,
dtype=('S5', 'float', 'float', 'float', 'float', 'float', 'float', 'S1')
, delimiter=',',
)
new_data = np.where(data!=0.0 ,data, -999)
I have used the np.where as in np.where(data!=0.0 ,data, -999) but I got an error:
TypeError: invalid type promotion
I do not know what I am missing...
ps 1. Perhaps it is solvable with pandas but I am looking for an independent solution
ps 2. I know that a dirty workaround would be to set the incorrect values (of 0.0s) as my missing flag in the initial file, but what is there are multiple values that we would like to exclude ? (or combining data with different flags)
Define a simple text:
In [55]: txt= '''foo,bar,test
...: a,1,2
...: b,3,4
...: '''
load with genfromtxt:
In [60]: data = np.genfromtxt(txt.splitlines(), encoding=None, names=True, dtype=None, delimiter=',')
In [61]: data
Out[61]:
array([('a', 1, 2), ('b', 3, 4)],
dtype=[('foo', '<U1'), ('bar', '<i8'), ('test', '<i8')])
Note the dtype - fields with different dtype and names.
Access fields by name:
In [64]: data['foo']
Out[64]: array(['a', 'b'], dtype='<U1')
Modify one field by index:
In [65]: data['bar']
Out[65]: array([1, 3])
In [66]: data['bar'][0] = 23
Modify another with boolean test (or where):
In [67]: test = data['test']
In [68]: test
Out[68]: array([2, 4])
In [69]: test==2
Out[69]: array([ True, False])
In [70]: test[test==2]=0
In [71]: test
Out[71]: array([0, 4])
In [72]: data
Out[72]:
array([('a', 23, 0), ('b', 3, 4)],
dtype=[('foo', '<U1'), ('bar', '<i8'), ('test', '<i8')])
Replacement might be easier if you grouped the numeric fields into one (but that requires more understanding of structured array dtypes):
In [80]: data = np.genfromtxt(txt.splitlines(), encoding=None, skip_header=1, dtype=[('id','U3'),('foo',int,2)],
...: delimiter=',')
In [81]: data
Out[81]:
array([('a', [1, 2]), ('b', [3, 4])],
dtype=[('id', '<U3'), ('foo', '<i8', (2,))])
In [82]: data['foo']
Out[82]:
array([[1, 2],
[3, 4]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With