For practice I've implemented the qoi specification in rust. In it there is a small hash function to store recently used pixels:
index_position = (r * 3 + g * 5 + b * 7 + a * 11) % 64
where r, g, b, and a are the red, green, blue and alpha channels respectively.
I assume this works as a hash because it creates a unique prime factorization for the numbers with the mod to limit the number of bytes. Anyways I implemented it naively in my code.
While looking at other implementations I came across this bit hack to optimize the hash calculation:
fn hash(rgba:[u8:4]) -> u8 {
let v = u32::from_ne_bytes(rgba);
let s = (((v as u64) << 32) | (v as u64)) & 0xFF00FF0000FF00FF;
s.wrapping_mul(0x030007000005000Bu64.to_le()).swap_bytes() as u8 & 63
}
I think I understand most of what's going on but I'm confused about the magic number (the multiplicand). To my understanding it should be flipped. As a step by step example:
let rgba = [0x12, 0x34, 0x56, 0x78].v the value 0x78563412.s = 0x7800340000560012.0x7800340000560012
0x030007000005000B
When multiplying it would seem that the highest value, the alpha channel (0x78), is being multiplied by 3, while the lowest value, the red channel (0x12), is being multiplied by 11. I'm also not entirely sure why this multiplication works anyway, after multiplying the values by various powers of 2.
I understand that the bytes are then swapped to big endian and trimmed, but that's not until after the multiplication step which loses me.
I know that the code produces the correct hash, but I don't understand why that's the case. Can anyone explain to me what I'm missing?
If you think about the way the math works, you want this flipped order, because it means all the results from each of the "logical" multiplications cluster in the same byte. The highest byte in the first value multiplied by the lowest byte in the second produces a result in the highest byte. The lowest byte in the first value's product with the highest byte in the second value produces a result in the same highest byte, and the same goes for the intermediate bytes.
Yes, the 0x78... and 0x03... are also multiplied by each other, but they overflow way past the top of the value and are lost. Having the order "backwards" means the result of the multiplications we care about all ends up summed in the uppermost byte (the total shift of the results we want is always 56 bits, because the 56th bit offset value is multiplied by the 0th, the 40th by the 16th, the 16th by the 40th, and the 0th by the 56th), with the rest of the multiplications we don't want having their results either overflow (and being lost) or appearing in lower bytes (which we ignore). If you flipped the bytes in the second value, the 0x78 * 0x0B (alpha value & multiplier) component would be lost to overflow, while the 0x12 * 0x03 (red value & multiplier) component wouldn't reach the target byte (every component we cared about would end up somewhat that wasn't the uppermost byte).
For a possibly more intuitive example, imagine doing the same work, but where all the bytes of one input except a single component are zero. If you multiply:
0x7800000000000000 * 0x030007000005000B
the logical result is:
0x1680348000258052800000000000000
but removing the overflow reduces that to:
0x2800000000000000
//^^ result we care about (actual product of 0x78 and 0x0B is 0x528, but only keeping low byte)
Similarly,
0x0000340000000000 * 0x030007000005000B
produces:
0x9c016c000104023c0000000000
overflowing to:
0x04023c0000000000
//^^ result we care about (actual product of 0x34 and 0x5 was 0x104, but only 04 kept)
In that case, the other multiplications did leave data in result (not all overflowed), but since we only look at the high byte, the rest gets ignored.
If you keep doing this math step by step and adding the results, you'll find that the high byte ends up the correct answer to the four individual multiplications you expected (mod 256); flip the order, and it won't work out that way.
The advantage to putting all the results in that high byte is that it allows you to use swap_bytes to move it cheaply to the low byte, and read the value directly (no need to even mask it on many architectures).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With