Microsoft has just announced that a software error in calculating dates (over leap year) caused a major outage in Windows Azure last week.
Was it really a simple error in judgement working around DateTime.Now.AddYears(1) on a leap year?  
What coding practices could have prevented this?
EDIT
As dcstraw pointed out DateTime.Now.AddYears(1) on a leap year does in fact return the correct date in .NET.  So it's not a framework bug, but evidently a bug in Date calculations.
Shameless plug:
Use a better date and time API
The built-in .NET date and time libraries are horribly hard to use properly. They do let you do everything you need, but you can't express yourself clearly through the type system. DateTime is a mess, DateTimeOffset may lull you into thinking you're actually preserving the time zone information when you're not, and TimeZoneInfo doesn't force you to think about everything you ought to be considering.
None of these provide a nice way of saying "just a time of day" or "just a date", nor do they make a clear distinction between "local time" and "time in a particular time zone". And if you want to use a calendar other than the Gregorian one, you need to go through the Calendar class the whole time.
All of this is why I'm building Noda Time - an alternative date and time library built on a port of the Joda Time "engine" but with a new (and leaner) API on top.
Some points you may want to think about, which are easy to miss if you're not aware of them:
TimeZoneInfo is generally willing to reveal, frankly. (It doesn't support a time zone whose idea of "standard time" changes over time, or which goes into permanent daylight saving time.)As far as specific development practices:
DateTime.Now or DateTime.UtcNow; it makes it easier (feasible!) to unit testIt's worth noting that the bug probably wasn't due to a line like you posted:
DateTime.Now.AddYears(1)
That doesn't create an invalid date. If you run:
(new DateTime(2012, 2, 29)).AddYears(1)
you get Feb 28, 2013. I don't know what Azure's guest agent is written in but it must have been a different call that failed. A bad way to have done this in .NET would have been:
new DateTime(today.Year + 1, today.Month, today.Day)
That throws an exception if today is leap day. However the Microsoft blog about the Azure issue said that they created an invalid date of Feb 29, 2013, which I'm not sure is possible to do with DateTime in .NET.
I'm not saying that DateTime and DateTimeOffset aren't error-prone, just that I don't think they would have caused this particular issue.
How can we develop coding practices designed to protect against leap year bugs? What coding practices could have prevented this?
Unit testing specific dates as John mentioned is one code practice that will assist however nothing beats what I define as a 'manual integration test'
change the clock on your development/testbed server and watch what happens when the time ticks over.
Don't get bogged down on specifics whether this is a 'coding practice' - Obviously you can't do this for every date on the calendar - pick the dates you are concerned with, be that the 29th Feb, end-of-month dates or daylight savings changeover dates.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With