in a book I'm reading, we are given the following snippet and problem:
This function uses a combination SCAS and STOS to do its work. First, explain what is the type of the [EBP+8] and [EBP+C] in line 1 and 8, respectively. Next, explain what this snippet does:
01: 8B 7D 08 mov edi, [ebp+8]
02: 8B D7 mov edx, edi
03: 33 C0 xor eax, eax
04: 83 C9 FF or ecx, 0FFFFFFFFh
05: F2 AE repne scasb
06: 83 C1 02 add ecx, 2
07: F7 D9 neg ecx
08: 8A 45 0C mov al, [ebp+0Ch]
09: 8B AA mov edi, edx
10: F3 AA rep stosb
11: 8B C2 mov eax, edx
I had nearly figured out everything after checking with an online solution (https://johannesbader.ch/2014/05/practical-reverse-engineering-exercises-page-11/), however, one step in this snippet still does not make sense ot me.
According to the online solution, when we run the command or ecx, 0FFFFFFFFh
at line 4, it says
We [now] interpret ECX as a signed integer -1
In order to know what the result is going to be for the or
command, wouldn't we need to know previously what the value of ECX
is? And why is the value -1?
Thanks
The 32-bit two's complement representation of -1
is 0xFFFFFFFF
(all-ones). 1 OR x
is always 1
, so this unconditionally sets ecx
to -1. This trick only works for -1, because OR can only set bits, not clear them to zero.
The part of the solution that you quote, about interpreting "ecx
as a signed integer -1", is only sensible in the context of the gdb command that follows: (gdb) p/d $ecx
-> $7 = -1
.
rep
prefixes treat ecx as an unsigned counter. Setting ecx to -1 / UINT_MAX means repne scasb
will only stop when it finds a zero in memory, not because ecx
counted down all the way. (In theory, if there was no zero, it would count down and end that way, but in practice it would segfault first. -1
isn't a special-case for rep
).
or
: code sizeThe "normal" way to set a register to anything other than zero is with a 5 byte mov r32, imm32
insn, for example B9 FF FF FF FF mov ecx,-1
.
If you care more about code-size than speed, or you know that a false dependency on ecx
isn't a problem here, you can save two bytes by using a sign-extended 8-bit immediate: or r/m32, imm8
.
83 C9 FF or ecx, 0FFFFFFFFh
None of the bits in the result actually depend on the old value of ecx, because. However, real CPUs don't special-case this, so out-of-order execution can't get started until ecx
is ready. This is a false dependency on the old value of ecx. mov
breaks the dependency on the previous value. (For more about this, see the x86 tag wiki, especially Agner Fog's guides).
or ecx, imm8
needs a ModRM byte to encode the destination as ecx, unlike that form of mov
where there's a separate opcode for each destination register. There's unfortunately no opcode for mov r/m32, imm8
, which would save 2 bytes of code in many instructions.
If Intel had been willing to drop backwards compatibility with undocumented instructions, they could have added it. (8086 didn't have it, because it would only help 16-bit code when moving an immediate to memory. They already dedicated 8 opcodes to mov r16, imm16
, which is 3 bytes in 16-bit mode where it doesn't need an operand-size prefix, just like the non-existent mov r/m16, imm8
would be.)
So this is a useful idiom when optimizing for code-size, e.g. for a bootloader, or a machine-code answer on https://codegolf.stackexchange.com/. (Yes, that's a thing.)
Another related trick is using a 3-byte lea
to create a constant, if you already have another constant in another register. e.g. for x86-64 Adler32, I needed two zeroed registers and a 1
, so I used
401120: 31 c0 xor eax,eax
401122: 99 cdq # zero rdx by sign-extending eax (0) into edx
401123: 8d 7a 01 lea edi,[rdx+0x1] # edi=0+1, using a reg + disp8 addressing mode
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With