count contiguous NaN values and get the start and final date of the NaN contiguous period

Question

I have contiguous periods of NaN values. I want to count NaN values from periods of contiguous NaN values, and also i want the start and end date of the contiguos period of NaN values.

df :
                         TMIN
2017-01-01 00:00:00      2.5
2017-01-02 00:00:00      NaN
2017-01-03 00:00:00      NaN
2017-01-04 00:00:00      2.2
2018-01-01 00:00:00      NaN
2018-01-02 00:00:00      NaN
2018-01-03 00:00:00      NaN
2018-01-04 00:00:00      5.0
2019-01-01 00:00:00      9.0
2019-01-02 00:00:00      8.0
2019-01-03 00:00:00      2.0
2019-01-04 00:00:00      NaN
2020-01-01 00:00:00      NaN
2020-01-02 00:00:00      NaN
2020-01-03 00:00:00      1.0
2020-01-04 00:00:00      NaN


Expected results :
       Start_Date      End date               number of contiguous missing values 
2017-01-02 00:00:00    2017-01-03 00:00:00                2
2018-01-01 00:00:00    2018-01-03 00:00:00                3
2019-01-04 00:00:00    2019-01-04 00:00:00                1
2020-01-01 00:00:00    2020-01-02 00:00:00                2
2020-01-04 00:00:00    2020-01-04 00:00:00                1

How can i solve this? Thanks in advance.

Shubham Sharma · Accepted Answer

CODE:

s = df.index.to_series()
m1, m2 = s.diff().dt.days.eq(1), df['TMIN'].isna()
out = s[m2].groupby([(~m1).cumsum(), (~m2).cumsum()])\
           .agg(['first', 'last', 'count']).reset_index(drop=True)

DETAILS:

Create boolean masks m1 and m2 such that m1 represents the condition where the difference between successive dates is 1 days and m2 represent the condition where the column TMIN contains NaN values:

>>> m1 
2017-01-01    False
2017-01-02     True
2017-01-03     True
2017-01-04     True
2018-01-01    False
2018-01-02     True
2018-01-03     True
2018-01-04     True
2019-01-01    False
2019-01-02     True
2019-01-03     True
2019-01-04     True
2020-01-01    False
2020-01-02     True
2020-01-03     True
2020-01-04     True
dtype: bool

>>> m2
2017-01-01    False
2017-01-02     True
2017-01-03     True
2017-01-04    False
2018-01-01     True
2018-01-02     True
2018-01-03     True
2018-01-04    False
2019-01-01    False
2019-01-02    False
2019-01-03    False
2019-01-04     True
2020-01-01     True
2020-01-02     True
2020-01-03    False
2020-01-04     True
Name: TMIN, dtype: bool

Use cumsum with the above boolean masks to identify the blocks of contiguous dates and NaN values:

>>> (~m1).cumsum()
2017-01-01    1
2017-01-02    1
2017-01-03    1
2017-01-04    1
2018-01-01    2
2018-01-02    2
2018-01-03    2
2018-01-04    2
2019-01-01    3
2019-01-02    3
2019-01-03    3
2019-01-04    3
2020-01-01    4
2020-01-02    4
2020-01-03    4
2020-01-04    4
dtype: int64

>>> (~m2).cumsum()
2017-01-01    1
2017-01-02    1
2017-01-03    1
2017-01-04    2
2018-01-01    2
2018-01-02    2
2018-01-03    2
2018-01-04    3
2019-01-01    4
2019-01-02    5
2019-01-03    6
2019-01-04    6
2020-01-01    6
2020-01-02    6
2020-01-03    7
2020-01-04    7
Name: TMIN, dtype: int64

Finally group the index of the datframe on the above blocks and aggregate using first, last and count to get the result:

>>> out

       first       last  count
0 2017-01-02 2017-01-03      2
1 2018-01-01 2018-01-03      3
2 2019-01-04 2019-01-04      1
3 2020-01-01 2020-01-02      2
4 2020-01-04 2020-01-04      1

anky · Answer

trying something different with more_iterools (standard python lib)

from more_itertools import consecutive_groups

m = df[df['TMIN'].isna()]
l = [list(i) for i in consecutive_groups(m.index.map(pd.Timestamp.toordinal))]

d = {pd.Timestamp.fromordinal(ele):e for e,item in enumerate(l) for ele in item}

out = (pd.Series(m.index,index=m.index.map(d))
        .groupby(level=0).agg(['min','max','count']))
#out.columns = ['Start_Date','End date','number of contiguous missing values']

print(out)

         min        max  count
0 2017-01-02 2017-01-03      2
1 2018-01-01 2018-01-03      3
2 2019-01-04 2019-01-04      1
3 2020-01-01 2020-01-02      2
4 2020-01-04 2020-01-04      1

count contiguous NaN values and get the start and final date of the NaN contiguous period

Tags:

python

pandas

Javier

2 Answers

Shubham Sharma

anky

Recent Activity

Donate For Us

count contiguous NaN values and get the start and final date of the NaN contiguous period

Tags:

python

pandas

Javier

2 Answers

Shubham Sharma

anky

Related questions

Recent Activity

Donate For Us