I'm generating a string representation of the current time in the local time zone for my logging system. I have an "old" version, and I wanted to see if I could improve its performance. Old version:
const auto now = std::chrono::system_clock::now();
const std::time_t t_c = std::chrono::system_clock::to_time_t(now);
struct tm loc;
localtime_s(&loc, &t_c);
std::stringstream ss;
ss << std::put_time(&loc, "%F %T.");
ss << std::setfill('0') << std::setw(6) << std::chrono::duration_cast<std::chrono::microseconds>(now.time_since_epoch()).count() % 1'000'000;
return ss.str();
New version:
const auto now = std::chrono::system_clock::now();
auto localNow = std::chrono::current_zone()->to_local(now);
return std::format("{0:%F} {0:%R}:{0:%S}", std::chrono::time_point_cast<std::chrono::microseconds>(localNow));
It turns out the new version is ~5 times slower.
I'm using Visual Studio 17.12.3 with /O2 and Whole Program Optimization. My CPU is an Intel 12700K.
Using the built-in performance profiler (sampling), it looks like both the to_local call and the std::format call are each slower than the entire old version.
Why is the new version slower? How to further optimize the old version?
There are two main reasons why the "new" version is significantly slower than the "old" one:
std::chrono::current_zone()->to_local(now).This performs a full IANA/Win32 time zone lookup and applies all historical DST transitions. On Windows, it calls Win32 Time Zone API under the hood; on POSIX, it reads TZDB files. Even if the data is cached after the first lookup, each call still binary-searches in-memory tables and applies offset calculations. This is modern solution that avoids many problems old one had. By contrast, localtime_s(&loc, &t_c) is just a thin wrapper over the CRT's process-wide tzinfo table, which is simpler and much faster to work with.
std::format("{0:%F} {0:%R}:{0:%S}", /*…*/);std::format uses a generic formatting engine that dispatches through std::formatter specializations for dates, performing locale checks, width and fill handling, and so on. Even though the format string is parsed at compile-time, the runtime still must traverse formatting tables and apply all formatting rules. So, even known compilers optimize <format> a lot, std::put_time(&loc, "%F %T.") is still faster, because it is time-specific formatting wrapping over strftime, which is plain C and optimized heavily for date and time formats.
Generally speaking, "new" versions is modern, more portable solution that comes with a cost of performance.
If the performance is strictly the goal, "old" solution still can be optimized further down by:
std::stringstream entirely. It can be replaced with std::format_to or fmt::memory_buffer. For maximal performance, consider snprintf into a fixed buffer directly;tzinfo and statically allocating char buffers to avoid dynamic allocation at all.But each of these optimizations comes with a cost of reduced readability and scalability of the code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With