Why chrono::timezone and format are slower than localtime_s and stringstream?

Question

I'm generating a string representation of the current time in the local time zone for my logging system. I have an "old" version, and I wanted to see if I could improve its performance. Old version:

const auto        now = std::chrono::system_clock::now();
const std::time_t t_c = std::chrono::system_clock::to_time_t(now);
struct tm         loc;
localtime_s(&loc, &t_c);

std::stringstream ss;
ss << std::put_time(&loc, "%F %T.");
ss << std::setfill('0') << std::setw(6) << std::chrono::duration_cast<std::chrono::microseconds>(now.time_since_epoch()).count() % 1'000'000;
return ss.str();

New version:

const auto now      = std::chrono::system_clock::now();
auto       localNow = std::chrono::current_zone()->to_local(now);
return std::format("{0:%F} {0:%R}:{0:%S}", std::chrono::time_point_cast<std::chrono::microseconds>(localNow));

It turns out the new version is ~5 times slower. I'm using Visual Studio 17.12.3 with /O2 and Whole Program Optimization. My CPU is an Intel 12700K.

Using the built-in performance profiler (sampling), it looks like both the to_local call and the std::format call are each slower than the entire old version.

Why is the new version slower? How to further optimize the old version?

TheAliceBaskerville · Accepted Answer

There are two main reasons why the "new" version is significantly slower than the "old" one:

std::chrono::current_zone()->to_local(now).

This performs a full IANA/Win32 time zone lookup and applies all historical DST transitions. On Windows, it calls Win32 Time Zone API under the hood; on POSIX, it reads TZDB files. Even if the data is cached after the first lookup, each call still binary-searches in-memory tables and applies offset calculations. This is modern solution that avoids many problems old one had. By contrast, localtime_s(&loc, &t_c) is just a thin wrapper over the CRT's process-wide tzinfo table, which is simpler and much faster to work with.

std::format("{0:%F} {0:%R}:{0:%S}", /*…*/);

std::format uses a generic formatting engine that dispatches through std::formatter specializations for dates, performing locale checks, width and fill handling, and so on. Even though the format string is parsed at compile-time, the runtime still must traverse formatting tables and apply all formatting rules. So, even known compilers optimize <format> a lot, std::put_time(&loc, "%F %T.") is still faster, because it is time-specific formatting wrapping over strftime, which is plain C and optimized heavily for date and time formats.

Generally speaking, "new" versions is modern, more portable solution that comes with a cost of performance.

If the performance is strictly the goal, "old" solution still can be optimized further down by:

Avoiding std::stringstream entirely. It can be replaced with std::format_to or fmt::memory_buffer. For maximal performance, consider snprintf into a fixed buffer directly;
Caching tzinfo and statically allocating char buffers to avoid dynamic allocation at all.

But each of these optimizations comes with a cost of reduced readability and scalability of the code.

Why chrono::timezone and format are slower than localtime_s and stringstream?

Tags:

c++

performance

c++-chrono

stdformat

Carsten Kjaer

1 Answers

TheAliceBaskerville

Recent Activity

Donate For Us

Why chrono::timezone and format are slower than localtime_s and stringstream?

Tags:

c++

performance

c++-chrono

stdformat

Carsten Kjaer

1 Answers

TheAliceBaskerville

Related questions

Recent Activity

Donate For Us