Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Drop() int64 based on value returns object

Tags:

python

pandas

I need to drop all rows where a one column are below a certain value. I used the command below, but this returns the column as an object. I need to keep it as int64:

df["customer_id"] = df.drop(df["customer_id"][df["customer_id"] < 9999999].index)
df = df.dropna()

I have tried to re-cast the field as int64 after, but this causes the following error with data from a totally different column:

invalid literal for long() with base 10: '2014/03/09 11:12:27'
like image 202
user6453877 Avatar asked Jan 21 '26 19:01

user6453877


1 Answers

I think you need boolean indexing with reset_index:

import pandas as pd

df = pd.DataFrame({'a': ['s', 'd', 'f', 'g'],
                'customer_id':[99999990, 99999997, 1000, 8888]})
print (df) 
   a  customer_id
0  s     99999990
1  d     99999997
2  f         1000
3  g         8888

df1 = df[df["customer_id"] > 9999999].reset_index(drop=True)
print (df1)
   a  customer_id
0  s     99999990
1  d     99999997

Solution with drop, but is slowier:

df2 = (df.drop(df.loc[df["customer_id"] < 9999999, 'customer_id'].index))
print (df2)
   a  customer_id
0  s     99999990
1  d     99999997

Timings:

In [12]: %timeit df[df["customer_id"] > 9999999].reset_index(drop=True)
1000 loops, best of 3: 676 µs per loop

In [13]: %timeit (df.drop(df.loc[df["customer_id"] < 9999999, 'customer_id'].index))
1000 loops, best of 3: 921 µs per loop
like image 169
jezrael Avatar answered Jan 24 '26 12:01

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!