Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling dtype in a 'for' loop

I'm reading the columns in a pandas dataframe using a for loop, using a nested if statement to find the minimum and maximum in the datetime range.

I can identify the datetime columns I need, but how can I find the correct way to pass the column variable into the dataframe.series.min() and max statement?

import pandas as pd
data = pd.somedata()

for column in data.columns:

    if data[column].dtype == 'datetime64[ns]':
        data.column.min()
        data.column.max()

So when the column variable is passed, the loop should return date time values like this:

data.DFLT_DT.min()

Timestamp('2007-01-15 00:00:00')


data.DFLT_DT.max()

Timestamp('2016-10-18 00:00:00')
like image 930
joshi123 Avatar asked Nov 30 '25 07:11

joshi123


1 Answers

You can just use select_dtypes to achieve this:

In [104]:
df = pd.DataFrame({'int':np.arange(5), 'flt':np.random.randn(5), 'str':list('abcde'), 'dt':pd.date_range(dt.datetime.now(), periods=5)})
df

Out[104]:
                          dt       flt  int str
0 2017-01-18 16:50:13.678037 -0.319022    0   a
1 2017-01-19 16:50:13.678037  0.400441    1   b
2 2017-01-20 16:50:13.678037  0.114614    2   c
3 2017-01-21 16:50:13.678037  1.594350    3   d
4 2017-01-22 16:50:13.678037 -0.962520    4   e

In [106]:
df.select_dtypes([np.datetime64])

Out[106]:
                          dt
0 2017-01-18 16:50:13.678037
1 2017-01-19 16:50:13.678037
2 2017-01-20 16:50:13.678037
3 2017-01-21 16:50:13.678037
4 2017-01-22 16:50:13.678037

Then you can get min,max on just these cols:

In [108]:
for col in df.select_dtypes([np.datetime64]):
    print('column: ', col)
    print('max: ',df[col].max())
    print('min: ',df[col].min())

column:  dt
max:  2017-01-22 16:50:13.678037
min:  2017-01-18 16:50:13.678037

To answer why your attempt failed, you're comparing a np.dtype object with a string, you want to compare against np.dtype.name:

In [125]:

for col in df:
    if df[col].dtype.name == 'datetime64[ns]':
        print('col', col)
        print('max', df[col].max())
        print('min', df[col].min())

col dt
max 2017-01-22 16:50:13.678037
min 2017-01-18 16:50:13.678037
like image 130
EdChum Avatar answered Dec 02 '25 21:12

EdChum



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!