Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use virtual memory/implement realloc on mac osx?

I'm playing with assembly on mac. On linux I implemented realloc by using mmap/mremap/munmap but there doesn't seem to be a mremap on mac. How would I implement realloc using virtual memory in assembly? What system call(s) would I need? I'm targeting M1 but x86-64 solutions are fine

like image 433
Cal Avatar asked Oct 31 '25 13:10

Cal


2 Answers

The low-level way on macOS/Mach are the (mach_)vm_allocate and (mach_)vm_deallocate functions. Instead of the first, you can also use (mach_)vm_map which allows to set the memory protection and inheritance behaviour (how to handle the area in child processes, AFAIK).

(See also What's the difference between mach_vm_allocate and vm_allocate?)

Reallocation is simply done by allocating another area exactly after the first. The kernel tries to merge (coalesce) them, if the conditions allow it. See the implementation of vm_map_enter (also used by vm_allocate), scroll down to the comment "See whether we can avoid creating a new entry …" where the logic starts.

So first you ask the kernel to allocate anywhere, then you ask it to allocate exactly after the region(s) you've already allocated.

The tags like VM_MAKE_TAG(VM_MEMORY_MALLOC_LARGE) play a role here, and as far as I understand the region you allocate to grow a previous one should use VM_MAKE_TAG(VM_MEMORY_REALLOC), but it looks like that's not strictly necessary.

You can also see this being done in magazine_large.c in Apple's libmalloc (search for VM_MEMORY_REALLOC). If this fails, a new area of the wanted size is allocated and the old content is copied using vm_copy if possible, otherwise a simple memcpy is done.


Allocation needs to be page-aligned. So at runtime, you need to know the page size the kernel wants to use. It's easy to get in C via the VM_PAGE_SIZE macro/vm_page_size variable, or several other ways. But when it comes to assembly (without any shared libraries), how do you get this information? Via the COMM PAGE, an area the kernel maps into every process at the same location which gives the processes some informations without the need for doing kernel calls.


So let's do this in C first. The following source should output (res = 0) (aka KERN_SUCCESS) for every operation.

#include <stdio.h>
#include <mach/mach_vm.h>
#include <mach/mach_init.h>
#include <mach/task_info.h>
#include <unistd.h>

int main(int argc, const char * argv[]) {
#if defined(__arm64__)
    // From xnu/osfmk/arm/cpu_abilities.h
#define _COMM_PAGE64_RO_ADDRESS       (0x0000000FFFFF4000ULL)
#define _COMM_PAGE_USER_PAGE_SHIFT_64 (_COMM_PAGE64_RO_ADDRESS+0x025)
    uint8_t commPageVMPageShift = *(uint8_t const * const)_COMM_PAGE_USER_PAGE_SHIFT_64;
    printf("Page shift from COMM_PAGE: %u, page size: %u\n", commPageVMPageShift, 1 << commPageVMPageShift);
#endif
    
    mach_vm_address_t address = 0;
    mach_vm_size_t size = VM_PAGE_SIZE;
//    kern_return_t res = mach_vm_allocate(mach_task_self(), &address, size, VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_MALLOC_LARGE));
    kern_return_t res = mach_vm_map(mach_task_self(), &address, size, 0, VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_MALLOC_LARGE), MEMORY_OBJECT_NULL, 0, FALSE, VM_PROT_DEFAULT, VM_PROT_ALL, VM_INHERIT_DEFAULT);
    printf("Allocated %llu bytes at %p (res = %d)\n", size, (void *)address, res);
    memset((void *)address, 0x42, size);
    
    mach_vm_size_t newSize = size + VM_PAGE_SIZE;
    mach_vm_address_t nextAddress = address + size;
    res = mach_vm_allocate(mach_task_self(), &nextAddress, newSize - size, VM_MAKE_TAG(VM_MEMORY_REALLOC));
    printf("Allocated additional %llu bytes at %p (res = %d)\n", newSize - size, (void *)nextAddress, res);
    
    res = mach_vm_deallocate(mach_task_self(), address, newSize);
    printf("Deallocated everything (res = %d)\n", res);

    return 0;
}

Whether you use mach_vm_allocate or vm_map here doesn't matter.


And now for fun, let's do it in ARM64 assembly without the use of any library:

// From xnu/osfmk/arm/cpu_abilities.h
_COMM_PAGE64_RO_ADDRESS = 0x0000000FFFFF4000
_COMM_PAGE_USER_PAGE_SHIFT_64 = (_COMM_PAGE64_RO_ADDRESS + 0x025)

// Kernel: "mach_task_self"
// Source: osfmk/mach/syscall_sw.h
.equ SYSCALL_MACH_TASK_SELF, -28

// Kernel: "_kernelrpc_mach_vm_allocate_trap"
// Source: osfmk/mach/syscall_sw.h
.equ SYSCALL_MACH_VM_ALLOCATE, -10

// Kernel: "_kernelrpc_mach_port_deallocate_trap"
// Source: osfmk/mach/syscall_sw.h
.equ SYSCALL_MACH_VM_DEALLOCATE, -12

// Define some helper registers (could also use the stack or variables in heap)
REG_PAGE_SIZE .req X20
REG_MACH_TASK_SELF .req X21
REG_ADDRESS .req X22

.global _main
_main:
    // Set up a stack frame
    stp FP, LR, [SP, #-16]!
    mov FP, SP

    // Read the page size shift used by kernel from the comm page.
    mov X0, #(_COMM_PAGE_USER_PAGE_SHIFT_64 & 0xFFFF)
    movk X0, #((_COMM_PAGE_USER_PAGE_SHIFT_64 >> 16) & 0xFFFF), LSL #16
    movk X0, #((_COMM_PAGE_USER_PAGE_SHIFT_64 >> 32) & 0xFFFF), LSL #32
    movk X0, #((_COMM_PAGE_USER_PAGE_SHIFT_64 >> 48) & 0xFFFF), LSL #48
    ldrb W1, [X0]
    // Calculate the page size using the shift value.
    mov W2, #1
    lslv W3, W2, W1
    // Remember the page size.
    mov REG_PAGE_SIZE, X3

    // Get mach_task_self
    mov X16, SYSCALL_MACH_TASK_SELF
    svc 80
    // Save mach_task_self for later use.
    mov REG_MACH_TASK_SELF, X0

    // Allocate one page.
    str XZR, [SP, #-16]! // Push null pointer on stack
    mov X0, REG_MACH_TASK_SELF // Arg 1: target
    mov X1, SP // Arg 2: pointer to address (in/out)
    mov X2, REG_PAGE_SIZE // Arg 3: size to allocate
    mov X3, #1 // Arg 4: VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_MALLOC_LARGE)
    movk X3, #0x0300, LSL #16 // value of the flags: 0x3000001
    mov X16, SYSCALL_MACH_VM_ALLOCATE
    svc 80
    cbnz X0, L_exit // Exit on failure.

    // Now we have the page address on the stack.
    ldr REG_ADDRESS, [SP]
    // Calculate adjacent page address
    add X0, REG_ADDRESS, REG_PAGE_SIZE
    // Store on the stack again for next call to `vm_allocate`
    str X0, [SP]

    // Allocate adjacent page.
    mov X0, REG_MACH_TASK_SELF // Arg 1: target
    mov X1, SP // Arg 2: pointer to address (in/out)
    mov X2, REG_PAGE_SIZE // Arg 3: size to allocate
    mov X3, #1 // Arg 4: VM_FLAGS_ANYWHERE | VM_MAKE_TAG(VM_MEMORY_REALLOC)
    movk X3, #0x0600, LSL #16 // value of the flags: 0x6000001
    mov X16, SYSCALL_MACH_VM_ALLOCATE
    svc 80
    cbnz X0, L_exit // Exit on failure.

    // Now we have two pages! On the stack we have the address of the second page.
    mov X0, REG_MACH_TASK_SELF // Arg 1: target
    mov X1, REG_ADDRESS // Arg 2: address (not a pointer)
    add X2, REG_PAGE_SIZE, REG_PAGE_SIZE // Arg 3: 2 * page_size
    mov X16, SYSCALL_MACH_VM_DEALLOCATE
    svc 80

L_exit:
    ret

Assemble and link using as -o example.o example.s ; ld -o example example.o. Run the binary, and if everything went well it should exit with return code 0.

like image 100
DarkDust Avatar answered Nov 03 '25 05:11

DarkDust


Using only POSIX mmap flags, the "optimistic" strategy is what Jester suggested, using mmap without MAP_FIXED to try to allocate new pages contiguous with what you already have. (The first arg is a "hint" of where you'd like it to allocate).

Instead of failing, it will allocate somewhere else (unless virtual address space is full, but that's unlikely on 64-bit). So you need to detect that mmap's return value != your hint. Probably just munmap that untouched space and ask again with the full size you need, then copy. You could attempt to mmap the remaining space onto the end of the new pages you just got, but that could fail and then you're making even more system calls.

On Linux you'd use mmap(MAP_FIXED_NOREPLACE) to return an error if it can't allocate where you want (without overlapping / replacing existing mappings).

Of course Linux mremap is even better, avoiding ever copying the data, just mapping the same physical pages to a new virtual address if you let it (with MREMAP_MAYMOVE). (mremap lets realloc be much more efficient for growing big arrays.) If MacOS doesn't have similar functionality via any MacOS-specific function calls or mmap flags, you simply can't get that functionality.


I find it really dumb that C++ std::vector is designed so it can't easily take advantage of realloc and thus mremap even if it exists, with replaceable new being a potentially visible side effect. And the new/delete allocator API entirely lacking a try-realloc that you could use even with non-trivially-copyable types. But this overly-conservative design in some higher-level languages means that low-level features might not get much use even if they existed, so I wouldn't be surprised if MacOS lacked it.

OTOH, C realloc certainly can use mremap if the original allocation has its pages to itself, and lots of stuff is written in C, not hobbled by C++'s allocator API. So MacOS might well support something like this somehow, but I don't know MacOS-specific system call details.

I did have a look at the table of BSD system calls in the Darwin XNU kernel https://github.com/opensource-apple/xnu/blob/master/bsd/kern/syscalls.master as suggested by macOS 64-bit System Call Table

There might be other whole categories of system call, but I'd hope that any mmap-related calls would be in the BSD family of calls, using the 0x2000000 class bit.

There is a int memorystatus_control(uint32_t command, int32_t pid, uint32_t flags, user_addr_t buffer, size_t buffersize); but that returns an int, so I assume it's not what we're looking for.

I didn't see any other system calls that looked at all promising for this.

I didn't check the MacOS man page for mmap; if it has any MacOS-specific flags like MAP_FIXED_NOREPLACE, they'd hopefully be there.

like image 32
Peter Cordes Avatar answered Nov 03 '25 04:11

Peter Cordes