Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assembly string instructions register DS and ES in real mode

I have been studying this assembly program from my book and I have a question about it. The purpose of this program is to simply copy string1 to string2. My question relates to the following two instructions:

mov    AX,DS        
mov    ES,AX 

I see without them, the program doesn't work properly, but I would have thought by pointing ESI to string1 and EDI to string2, that would be all you need to do. Then just increment ESI and EDI and move it character by character. What exactly does DS hold and why do we need to move it to ES?

.DATA
string1    db    'The original string',0
strLen     EQU   $ - string1
.UDATA
string2    resb    80
.CODE
    .STARTUP
    mov    AX,DS          ; set up ES
    mov    ES,AX          ;  to the data segment
    mov    ECX,strLen     ; strLen includes NULL
    mov    ESI,string1
    mov    EDI,string2
    cld                   ; forward direction
    rep    movsb
like image 644
Rubiks Avatar asked Jan 30 '26 22:01

Rubiks


1 Answers

All the string instructions that use EDI use ES:EDI. (or di or rdi)

Explicit addressing modes using EDI (like [edi]) default to DS, but movs/stos/scas/cmps (with/without rep/repz/nz) all use es:edi. lods only uses ds:esi. (rep lods "works", but is rarely useful. With cx=0 or 1 it can work as a slow conditional load, because unlike loop, rep checks cx before decrementing.)

Note that even though scas is read-only, it uses (r|e)di. This makes it work well with lods: load from one array with lods, the scas to compare against a different array. (Optionally with some kind of processing of (r|e)ax before the compare).


Normally when you can use 32-bit addresses, you have a flat memory model where all segments have the same base and limit. Or if you're making a .COM flat binary with NASM, you have the tiny real-mode memory model where all segments have the same value. See @MichaelPetch's comments on this answer and on the question. If your program doesn't work without setting ES, you're doing something weird. (like maybe clobbering es somewhere?)

Note that rep movsb in 16-bit mode without an address-size prefix uses CX, DS:SI, and ES:DI, regardless of whether you used operand-size prefixes to write edi instead of di.


Also note that rep string instructions (and especially the non-rep versions) are **often not the fastest way to do things. They're good for code-size, but often slower than SSE/AVX loops.

rep stos and rep movs have fast microcoded implementation that store or copy in chunks of 16 or 32 bytes (or 64 bytes on Skylake-AVX512?). See Enhanced REP MOVSB for memcpy. With 32-byte aligned pointers and medium to large buffer sizes, they can be as fast as optimized AVX loops. With sizes below 128 or 256 bytes on modern CPUs, or unaligned pointers, AVX copy loops typically win. Intel's optimization manual has a section on this.

But repne cmpsb is definitely not the fastest way to implement memcmp: use SSE2 or AVX2 SIMD compares (pcmpeqb), because the microcode still only compares a byte at a time. (Beware of reading past the end of the buffer, especially avoid crossing a page (or preferably cache line) boundary.) Anyway, repne / repe don't have "fast strings" optimizations in Intel or AMD CPUs, unfortunately.

like image 173
Peter Cordes Avatar answered Feb 02 '26 14:02

Peter Cordes