It is said that mmap() maps files to the memory, and it costs to the virtual address space memory of the calling process. Does it really copy data to the memory, or the data still exists in the disk? Is mmap() faster than read()?
The only thing the mmap function really does is change some kernel data structures, and possibly the page table.  It doesn't actually put anything into physical memory at all.  After you call mmap, the allocated region probably doesn't even point to physical memory: accessing it will cause a page fault.  This kind of page fault is transparently handled by the kernel, in fact, this is one of the kernel's primary duties.
What happens with mmap is that the data remains on disk, and it is copied from disk to memory as your process reads it.  It can also be copied to physical memory speculatively.  When your process gets swapped out, the pages in the mmap region do not have to be written to swap because they are already backed by long-term storage -- unless you have modified them, of course.
However, mmap will consume virtual address space, just like malloc and other similar functions (which mostly use mmap behind the scenes, or sbrk, which is basically a special version of mmap).  The main difference between using mmap to read a file and read to read a file is that unmodified pages in an mmap region do not contribute to overall memory pressure, they are almost "free", memory wise, as long as they are not being used.  In contrast, files read with the read function will always contribute to memory pressure whether they are being used or not, and whether they have been modified or not.
Finally, mmap is faster than read only in the use cases which it favors -- random access and page reuse.  For linearly traversing a file, especially a small file, read will generally be faster since it does not require modifying the page tables, and it takes fewer system calls.
As a recommendation, I can say that any large file which you will be scanning through should generally be read in its entirety with mmap on 64-bit systems, and you can mmap it in chunks on 32-bit systems where virtual memory is less available.
See also: mmap() vs. reading blocks
See also (thanks to James): When should I use mmap for file access?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With