Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can kernel code make things read-only in a way that other kernel code can't undo?

I'm under the impression that the Linux kernel's attempts to protect itself revolve around not letting malicious code run in kernelspace. In particular, if a malicious kernel module were to get loaded, that it would be "game over" from a security standpoint. However, I recently came across a post that contradicts this belief and says that there is some way that the kernel can protect parts of itself from other parts of itself:

There is plenty of mechanisms to protect you against malicious modules. I write kernel code for fun so I have some experience in the field; it's basically a flag in the pagetable.

What's there to stop any kernel module from changing that flag in the pagetable back? The only protection against malicious modules is keeping them from loading at all. Once one loads, it's game over.

Make the pagetable readonly. Done.

Kernel modules can just make it read-write again, the same way your code made it read-only, then carry on with their changes.

You can actually lock this down so that kernel mode cannot modify the page table until an interrupt occurs. If your IDT is read-only as well there is no way for the module to do anything about it.

That doesn't make any sense to me. Am I missing something big about how kernel memory works? Can kernelspace code restrict itself from modifying the page table? Would that really prevent kernel rootkits? If so, then why doesn't the Linux kernel do that today to put an end to all kernel rootkits?

like image 381
Joseph Sible-Reinstate Monica Avatar asked May 23 '26 11:05

Joseph Sible-Reinstate Monica


1 Answers

If the malicious kernel code is loaded in the trusted way (e.g. loading a kernel module and not exploiting a vulnerability) then no: kernel code is kernel code.

Intel CPUs do have a series of mechanisms to disable read/write access to kernel memory:

  • CR0.WP if set disallows writes accesses to both user and kernel read-only pages. Used to detect bugs in the kernel code.
  • CR4.PKE if set (4-level paging must be enabled, mandatory in 64-bit mode) disallows the kernel from accessing (not including instruction fetches) the user page mode unless these are tagged with the right key (which marks their RW permissions). Used to allow the kernel to write to structures like VSDO and KUSER_SHARED_DATA but not other user mode structures. The keys permissions are in an MSR, not in memory; the keys themselves are in the page table entries.
  • CR4.SMEP if set disallows kernel instruction fetching from user mode pages. Used to prevent attacks where a kernel function pointer is redirected to a user mode allocated page (e.g. the nelson.c privilege escalation exploit).
  • CR4.SMAP if set disallows kernel access to user mode pages during implicit access or during any type (implicit or explicit) of access (if EFLAGS.AC=0, thus overriding the protection keys). Used to enforce a more strictly no-user-mode-access policy.
  • Of course the R/W and U/S bits in the paging structures control if the item is read-only/read-write and assigned to user or kernel.

You can read how permissions are applied for supervisor-mode accesses in the Intel manual:

Data writes to supervisor-mode addresses.
Access rights depend on the value of CR0.WP:
- If CR0.WP = 0, data may be written to any supervisor-mode address.
- If CR0.WP = 1, data may be written to any supervisor-mode address with a translation for which the R/W flag (bit 1) is 1 in every paging-structure entry controlling the translation; data may not be written to any supervisor-mode address with a translation for which the R/W flag is 0 in any paging-structure entry controlling the translation.

So even if the kernel protected a page X as read-only and then protected the page structures themselves as read-only, a malicious module could simply clear CR0.WP.
It could also change CR3 and use its own paging structures.

Note that Intel developed SGX to address the threat model where the kernel itself is evil.
However, running the kernel components into enclaves in a secure way (i.e. no single point of failure) may not be trivial.

Another approach is virtualizing the kernel with the VMX extension, though this is by no way trivial to implement.

Finally, the CPU has four protection levels at the segmentation layer but paging has only two: supervisor (CPL = 0) and user (CPL > 0).
It is theoretically possible to run a kernel component in "Ring 1" but then you'd need to make an interface (e.g. something like a call gate or syscall) for it to access the other kernel functions.
It's easier to run it in user mode altogether (since you don't trust the module in the first place).

I have no idea what this is supposed to mean:

You can actually lock this down so that kernel mode cannot modify the page table until an interrupt occurs.

I don't recall any mechanism by which the interrupt handling will lock/unlock anything. I'm curious though, if anybody can shed some light they are welcome.

Security in the x86 CPUs (but this may be generalized) has always been hierarchical: whoever cames first set up the constraints for whoever cames later.
There is usually little to no protection between nonisolated components at the same hierarchical level.

like image 81
Margaret Bloom Avatar answered May 25 '26 07:05

Margaret Bloom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!