I'm working with a day of year column that ranges from 1 to 366 (to account for leap years). I need to convert this column into a date for a specific task and I would like to set it to a year that is very unlikely to appear in my time series.
Is there a way to set it to the oldest leap year of pandas?
import pandas as pd
# here is an example where the data have already been converted to datetime object
# I just missed the year to set
dates = pd.Series(pd.to_datetime(['2023-05-01', '2021-12-15', '2019-07-20']))
first_leap_year = 2000 # this is where I don't know what to set
new_dates = dates.apply(lambda d: d.replace(year=first_leap_year))
The documentation for the pandas.Timestamp type says:
Timestamp is the pandas equivalent of python’s Datetime and is interchangeable with it in most cases.
So we can look up the Python documentation for datetime objects, where we find:
Like a date object, datetime assumes the current Gregorian calendar extended in both directions; like a time object, datetime assumes there are exactly 3600*24 seconds in every day.
In other words, it assumes that the current rules for calculating leap years apply at any point in history, even though they were actually introduced in 1582, and adopted by different countries over the next few centuries. (The technical term for this is a "proleptic Gregorian calendar".)
Standard Python has a datetime.MINYEAR constant:
The smallest year number allowed in a date or datetime object. MINYEAR is 1.
So the lowest year divisible by 4, (and not by 100, so meeting the Gregorian definition of leap year as well as the Julian one) would be 4.
However, Pandas also has pandas.Timestamp.min:
Timestamp.min = Timestamp('1677-09-21 00:12:43.145224193')
(In case you're wondering, that's 263 nanoseconds before January 1, 1970, i.e. the limit of a 64-bit signed integer with nanosecond resolution.)
So you probably want a year after 1677, meaning the earliest available year would be 1680.
I ended up doing the same reasoning as @IMSoP but using pandas only so less general:
First I searched for the very first available year in Pandas:
import pandas as pd
start_date = pd.Timestamp.min
Then I wrote a small function to check if a year is a leap year and applied to the range of date from my first ever year to the next 50 (which is of course overkill but I felt safer):
def is_leap_year(year):
return (year % 4 == 0 and year % 100 != 0) or (year % 400 == 0)
for year in range(start_date.year, start_date.year + 50):
if is_leap_year(year):
print(f"Oldest leap year in pandas: {year}")
break
final answer is consistent with the datetime based answer (which is great):
Oldest leap year in pandas: 1680
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With