Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OR instruction in assembly into ECX register

Tags:

x86

assembly

in a book I'm reading, we are given the following snippet and problem:

This function uses a combination SCAS and STOS to do its work. First, explain what is the type of the [EBP+8] and [EBP+C] in line 1 and 8, respectively. Next, explain what this snippet does:

01: 8B 7D 08    mov edi, [ebp+8]
02: 8B D7       mov edx, edi
03: 33 C0       xor eax, eax
04: 83 C9 FF    or ecx, 0FFFFFFFFh
05: F2 AE       repne scasb
06: 83 C1 02    add ecx, 2
07: F7 D9       neg ecx
08: 8A 45 0C    mov al, [ebp+0Ch]
09: 8B AA       mov edi, edx
10: F3 AA       rep stosb
11: 8B C2       mov eax, edx

I had nearly figured out everything after checking with an online solution (https://johannesbader.ch/2014/05/practical-reverse-engineering-exercises-page-11/), however, one step in this snippet still does not make sense ot me.

According to the online solution, when we run the command or ecx, 0FFFFFFFFh at line 4, it says

We [now] interpret ECX as a signed integer -1

In order to know what the result is going to be for the or command, wouldn't we need to know previously what the value of ECX is? And why is the value -1?

Thanks

like image 464
X33 Avatar asked Sep 20 '25 12:09

X33


1 Answers

The 32-bit two's complement representation of -1 is 0xFFFFFFFF (all-ones). 1 OR x is always 1, so this unconditionally sets ecx to -1. This trick only works for -1, because OR can only set bits, not clear them to zero.


The part of the solution that you quote, about interpreting "ecx as a signed integer -1", is only sensible in the context of the gdb command that follows: (gdb) p/d $ecx -> $7 = -1.

rep prefixes treat ecx as an unsigned counter. Setting ecx to -1 / UINT_MAX means repne scasb will only stop when it finds a zero in memory, not because ecx counted down all the way. (In theory, if there was no zero, it would count down and end that way, but in practice it would segfault first. -1 isn't a special-case for rep).


Why use or: code size

The "normal" way to set a register to anything other than zero is with a 5 byte mov r32, imm32 insn, for example B9 FF FF FF FF mov ecx,-1.

If you care more about code-size than speed, or you know that a false dependency on ecx isn't a problem here, you can save two bytes by using a sign-extended 8-bit immediate: or r/m32, imm8.

83 C9 FF    or ecx, 0FFFFFFFFh

None of the bits in the result actually depend on the old value of ecx, because. However, real CPUs don't special-case this, so out-of-order execution can't get started until ecx is ready. This is a false dependency on the old value of ecx. mov breaks the dependency on the previous value. (For more about this, see the x86 tag wiki, especially Agner Fog's guides).

or ecx, imm8 needs a ModRM byte to encode the destination as ecx, unlike that form of mov where there's a separate opcode for each destination register. There's unfortunately no opcode for mov r/m32, imm8, which would save 2 bytes of code in many instructions.

If Intel had been willing to drop backwards compatibility with undocumented instructions, they could have added it. (8086 didn't have it, because it would only help 16-bit code when moving an immediate to memory. They already dedicated 8 opcodes to mov r16, imm16, which is 3 bytes in 16-bit mode where it doesn't need an operand-size prefix, just like the non-existent mov r/m16, imm8 would be.)


So this is a useful idiom when optimizing for code-size, e.g. for a bootloader, or a machine-code answer on https://codegolf.stackexchange.com/. (Yes, that's a thing.)

Another related trick is using a 3-byte lea to create a constant, if you already have another constant in another register. e.g. for x86-64 Adler32, I needed two zeroed registers and a 1, so I used

401120:       31 c0          xor  eax,eax
401122:       99             cdq                 # zero rdx by sign-extending eax (0) into edx
401123:       8d 7a 01       lea  edi,[rdx+0x1]  # edi=0+1, using a reg + disp8 addressing mode
like image 96
Peter Cordes Avatar answered Sep 23 '25 10:09

Peter Cordes