Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing values with mixed datatypes

I have the following DataFrame

A       B       C
1.0     abc     1.0
abc     1.0     abc
-1.11   abc     abc

I have mixed datatypes (float and str). How can I drop values <= -1 in column A.

I get an error if I do the following because of the mixed datatypes

df['A'] = (df['A'] != "abc") & (df['A'] > -1)
TypeError: '>' not supported between instances of 'str' and 'int'

How can I change my object to make abc a str and 1.0 a float so I can:

(df['A'] != "abc") & (df['A'] > -1)

print(df['A'].dtype)
    -> object

I would like the expected output

df = 

A       B       C
1.0     abc     1.0
abc     1.0     abc
NaN     abc     abc
like image 373
satoshi Avatar asked Oct 16 '25 04:10

satoshi


1 Answers

There are at least a couple of different approaches to this problem.

loc + pd.to_numeric

pd.DataFrame.loc accepts Boolean series, so you can calculate a mask via pd.to_numeric and feed into the loc setter.

Note there is no need to specify df['A'] != 'abc' because the mask series will convert these values to NaN.

mask = pd.to_numeric(df['A'], errors='coerce') < -1
df.loc[mask, 'A'] = np.nan

print(df)

     A    B    C
0    1  abc    1
1  abc    1  abc
2  NaN  abc  abc

try / except

See @Jan's solution. This solution is preferable if you expect values to be numeric and are only looking for alternative treatment in edge cases.

like image 132
jpp Avatar answered Oct 17 '25 18:10

jpp



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!