I am reading the "strlen" source code from the glibc, and the trick developers found to speed it up is to read n bytes where n is the size of a long word, instead of reading 1 byte at each iteration.
I will assume that a long word has 4 bytes.
The tricky part is that every "chunk" of 4 bytes the function reads can contain a null byte, so at each iteration, the function has to check if there was a null byte in the chunk. They do it like
if (((longword - lomagic) & ~longword & himagic) != 0) { /* null byte found */ }
where longword
is the chunk of data and himagic
and lowmagic
are magical values defined as:
himagic = 0x80808080L;
lomagic = 0x01010101L;
Here is the comment for thoses values
/* Bits 31, 24, 16, and 8 of this number are zero. Call these bits
the "holes." Note that there is a hole just to the left of
each byte, with an extra at the end:
bits: 01111110 11111110 11111110 11111111
bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
The 1-bits make sure that carries propagate to the next 0-bit.
The 0-bits provide holes for carries to fall into. */
How does this trick of finding the null byte work?
From the famous "Bit Twiddling Hacks" page By Sean Eron Anderson, a description of what is currently used in the glibc
implementation you're referring to (Anderson calls the algorithm hasless(v, 1)
):
The subexpression
(v - 0x01010101UL)
, evaluates to a high bit set in any byte whenever the corresponding byte in v is zero or greater than0x80
. The sub-expression~v & 0x80808080UL
evaluates to high bits set in bytes where the byte of v doesn't have its high bit set (so the byte was less than0x80
). Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in v were zero, since the high bits set due to a value greater than0x80
in the first sub-expression are masked off by the second.
It appears that the comment(s) in the glibc
source is confusing because it doesn't apply to what the code is actually doing anymore - it's describing what would have been an implementation of the algorithm that Anderson describes just before the hasless(v, 1)
algorithm is described.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With