I have a pandas datagframe created from a csv file. One column of this dataframe contains numeric data that is initially cast as a string. Most entries are numeric-like, but some contain various error codes that are non-numeric. I do not know beforehand what all the error codes might be or how many there are. So, for instance, the dataframe might look like:
[In 1]: df
[Out 1]:
            data     OtherAttr
MyIndex
0           1.4        aaa
1           error1     foo
2           2.2        bar
3           0.8        bar
4           xxx        bbb
...
743733      BadData    ccc
743734      7.1        foo
I want to cast df.data as a float and throw out any values that don't convert properly. Is there a built-in functionality for this? Something like:
df.data = df.data.astype(float, skipbad = True)
(Although I know that specifically will not work and I don't see any kwargs within astype that do what I want)
I guess I could write a function using try and then use pandas apply or map, but that seems like an inelegant solution. This must be a fairly common problem, right?
Use the convert_objects method which "attempts to infer better dtype for object columns":
In [11]: df['data'].convert_objects(convert_numeric=True)
Out[11]: 
0    1.4
1    NaN
2    2.2
3    0.8
4    NaN
Name: data, dtype: float64
In fact, you can apply this to the entire DataFrame:
In [12]: df.convert_objects(convert_numeric=True)
Out[12]: 
         data OtherAttr
MyIndex                
0         1.4       aaa
1         NaN       foo
2         2.2       bar
3         0.8       bar
4         NaN       bbb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With