Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas bug when trying to loc data on a quarter

With a dataframe that has a datetime index, I am used to getting data for quarters with the syntax eg. df.loc["2014-Q1"] to grab the data for the first quarter of 2014 (Jan, Feb, Mar).

This works normally in most cases, but I came upon a bug when used a resampled dataframe. I am unsure of whether this is expected behaviour from pandas or if there is a corner case bug here.

I am using pandas 2.1.1 in python 3.12.

The initial following code produces expected results, for example:

df = pd.DataFrame(index=pd.date_range(start="2014-01-01", end="2023-01-01", freq="M"))
df.loc["2014-Q1"]

does return the expected dataframe (empty, with indices in the first quarter of 2014), e.g.

Empty DataFrame
Columns: []
Index: [2014-01-31 00:00:00, 2014-02-28 00:00:00, 2014-03-31 00:00:00]

However, if I try resampling then I get an unexpected behaviour.

The following throws an error,

df.resample("QS").sum().loc["2014-Q1"]

it tells me essentially that it can't find the key value.

File ~/anaconda3/envs/py3/lib/python3.12/site-packages/pandas/core/indexes/datetimes.py:613, in DatetimeIndex.get_loc(self, key)
    611             return self._partial_date_slice(reso, parsed)
    612         except KeyError as err:
--> 613             raise KeyError(key) from err
    615     key = parsed
    617 elif isinstance(key, dt.timedelta):
    618     # GH#20464

KeyError: '2014-Q1'

When I started digging into this, I found that doing df.loc[f"{year}-Q{quarter}"] could in fact search for data on the previous year. Because my dataframe doesn't have indices for 2013, it won't show it.

Using the same minimal example, I tried

df.resample("QS").sum().loc["2015-Q1"]

and the data it returns is for 2014!

Empty DataFrame
Columns: []
Index: [2014-01-01 00:00:00]

Is this normal behaviour after the resampling, or is it a bug in pandas?

like image 572
Someone1348 Avatar asked Dec 07 '25 16:12

Someone1348


1 Answers

This looks like a pandas bug, related to the df.index.freq attribute. It gets set to QS-Jan after the resampling. Setting df.index.freq = None before attempting the loc solves the issue.

I have raised an issue in their repo https://github.com/pandas-dev/pandas/issues/58255

like image 139
Someone1348 Avatar answered Dec 09 '25 15:12

Someone1348



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!