Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

count values of each month, fill NaN if under certain limit

I am working with a dataframe, where every column represents a company. The index is a datetime index with daily frequency. My problem is the following: For each company, I would like to fill a month with NaN if there are less than 20 values in that month. In the example below, this would mean that Company_1's entry 0.91 on 2012-08-31 would be changed to NaN, while company_2 and 3 would be unchanged.

               Company_1      Company_2   Company_3
2012-08-01     NaN            0.99        0.11
2012-08-02     NaN            0.21        NaN
2012-08-03     NaN            0.32        0.40
...            ...            ...         ...
2012-08-29     NaN            0.50       -0.36
2012-08-30     NaN            0.48       -0.32
2012-08-31     0.91           0.51       -0.33

Total Values:  1                22          21

I am struggling to find an efficient way to count the number of values for each month of each stock. I could theoretically write a function which creates a new dataframe, which reports the number of values for each month (and for each stock), to then use that dataframe for the original company information, but I am sure that there has to be an easier way. Any help is highly appreciated. Thanks in advance.

like image 212
Sanoj Avatar asked Jan 31 '26 03:01

Sanoj


2 Answers

groupby the dataframe on monthly freq and transform using count then using Series.lt create a boolean mask and use this mask to fill NaN values in dataframe:

df1 = df.mask(df.groupby(pd.Grouper(freq='M')).transform('count').lt(20))

print(df1)
            Company_1  Company_2  Company_3
2012-08-01        NaN       0.99       0.11
2012-08-02        NaN       0.21        NaN
2012-08-03        NaN       0.32       0.40
....
2012-08-29        NaN       0.50      -0.36
2012-08-30        NaN       0.48      -0.32
2012-08-31        NaN       0.51      -0.33
like image 88
Shubham Sharma Avatar answered Feb 02 '26 15:02

Shubham Sharma


IIUC:

df.loc[:, df.apply(lambda d: d.notnull().sum()<20)] = np.NaN

print (df)

            Company 1  Company 2  Company 3
2012-08-01        NaN       0.99       0.11
2012-08-02        NaN       0.21        NaN
2012-08-03        NaN       0.32       0.40
2012-08-29        NaN       0.50      -0.36
2012-08-30        NaN       0.48      -0.32
2012-08-31        NaN       0.51      -0.33
like image 35
Henry Yik Avatar answered Feb 02 '26 17:02

Henry Yik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!