According to the man page, we can specify the amount of bytes we want to read from a file descriptor.
But in the read's implementation, how many read requests will be created to perform a read?
For example, if I want to read 4MB, will it create only one request for 4MB or will it split it into multiple small requests? such as 4KB per request?
read(2) is a system call, so it calls the vDSO shared library to dispatch the system call (in very old times it used to be an interrupt, but nowadays there are faster ways of dispatching system calls).
inside the kernel the call is first handled by the vfs (virtual file system); the virtual file system provides a common interface for inodes (the structures that represents open files) and a common way of interfacing with the underlying file system.
the vfs dispatches to the underlying file system (the mount(8) program will tell you which mount point exists and what file system is used there). (see here for more information http://www.inf.fu-berlin.de/lehre/SS01/OS/Lectures/Lecture16.pdf )
the file system can do its own caching, so number of disk reads depends on what is present in the cache and how the file system allocates blocks for storage of a particular file and how the file is divided into disk blocks - all questions to the particular file system)
If you want to do your own caching then open the file with O_DIRECT flag; in this case there is an effort not to use the cache; however all reads have to be aligned to 512 offsets and come in multiples of 512 size (this is in order that your buffer can be transfered via DMA to the backing store http://www.quora.com/Why-does-O_DIRECT-require-I-O-to-be-512-byte-aligned )
It depends on how deep you go.
The C library just passes the size you gave it straight to the kernel in one read()
system call, so at that level it's just one request.
Inside the kernel, for an ordinary file in standard buffered mode the 4MB you requested is going to be copied from multiple pagecache pages (4kB each) which are unlikely to be contiguous. Any of the file data which isn't actually already in the pagecache is going to have to be read from disk. The file might not be stored contiguously on disk, so that 4MB could result in multiple requests to the underlying block device.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With