Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas datetime anchored offset for (-) MonthBegin doesn't work as expected

Tags:

pandas

I need to move back to the beginning of the month but if i'm already at the beginning I want to stay there. Pandas anchored offset with n=0 is supposed to do exactly that but it doesn't produce the expected results between the anchored points for the (-) MonthBegin .

For example for this pd.Timestamp('2017-01-06 00:00:00') - pd.tseries.offsets.MonthBegin(n=0) I expect to move me back to Timestamp('2017-01-01 00:00:00') but instead I get Timestamp('2017-02-01 00:00:00') What am I doing wrong? Or you think it's a bug?

I can also see that the same rule works fine for the MonthEnd so combining the 2 like below pd.Timestamp('2017-01-06 00:00:00')+pd.tseries.offsets.MonthEnd(n=0)-pd.tseries.offsets.MonthBegin(n=1) I get the desired effect of Timestamp('2017-01-01 00:00:00') but my expectation for it to work with just - pd.tseries.offsets.MonthBegin(n=0)

like image 393
Lamakaha Avatar asked Feb 02 '17 17:02

Lamakaha


People also ask

How do I use Pandas date offset?

DateOffsets can be created to move dates forward a given number of valid dates. For example, Bday(2) can be added to a date to move it two business days forward. If the date does not start on a valid date, first it is moved to a valid date and then offset is created.

What is DateOffset?

The DateTimeOffset structure represents a date and time value, together with an offset that indicates how much that value differs from UTC. Thus, the value always unambiguously identifies a single point in time.

How do I add months to pandas?

In pandas, a string is converted to a datetime object using the pd. to_datetime() method and pd. DateOffset() method is used to add months to the created pandas object.


2 Answers

To jump to the month's start, use:

ts + pd.tseries.offsets.MonthEnd(n=0) - pd.tseries.offse‌​ts.MonthBegin(n=1)

Yes, it's ugly, but it's the only method to jump to the first of the month while staying there if ts is already the first.

Quick demo:

>>> pd.date_range(dt.datetime(2016,12,30), dt.datetime(2017,2,2)).to_series() \
        + MonthEnd(n=0) - MonthBegin(n=1)

2016-12-30   2016-12-01
2016-12-31   2016-12-01
2017-01-01   2017-01-01
2017-01-02   2017-01-01
...
2017-01-31   2017-01-01
2017-02-01   2017-02-01
2017-02-02   2017-02-01
like image 137
Christian Aichinger Avatar answered Oct 06 '22 06:10

Christian Aichinger


This is indeed the correct behavior that is witnessed which are part of the rules laid out in Anchored Offset Semantics for offsets supporting start/end of a particular frequency offset.

Consider the given example:

from pandas.tseries.offsets import MonthBegin

pd.Timestamp('2017-01-02 00:00:00') - MonthBegin(n=0)
Out[18]:
Timestamp('2017-02-01 00:00:00')

Note that the anchor point corresponding to MonthBegin offset is set as first of every month. Now, since the given timestamp clearly surpasses this day, these would automatically be treated as though it were a part of the next month and rolling (whether forward or backwards) would come into play only after that.

excerpt from docs
For the case when n=0, the date is not moved if on an anchor point, otherwise it is rolled forward to the next anchor point.


To get what you're after, you need to provide n=1 which would roll the timestamp to the correct date.

pd.Timestamp('2017-01-02 00:00:00') - MonthBegin(n=1)
Out[20]:
Timestamp('2017-01-01 00:00:00')

If you had set the date on the exact anchor point, then also it would give you the desired result as per the attached docs.

pd.Timestamp('2017-01-01 00:00:00') - MonthBegin(n=0)
Out[21]:
Timestamp('2017-01-01 00:00:00')
like image 30
Nickil Maveli Avatar answered Oct 06 '22 04:10

Nickil Maveli