Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the GIL handle chunked I/O read/write?

Say I had a io.BytesIO() I wanted to write a response to sitting on a thread:

f = io.ByteIO()
with requests.Session() as s:
    r = s.get(url, stream = True)
    for chunk in r.iter_content(chunk_size = 1024):
        f.write(chunk)

Now this is not to harddisk but rather in memory (got plenty of it for my purpose), so I don't have to worry about the needle being a bottleneck. I know for blocking I/O (file read/write) the GIL is released from the docs and this SO post by Alex Martelli, but I wonder, does the GIL just release on f.write() and then reacquire on the __next__() call of the loop?

So what I end up with are a bunch of fast GIL acquisitions and releases. Obviously I would have to time this to determine anything worth note, but does writing to in memory file objects on a multithreaded web scraper in general support GIL bypass?

If not, I'll just handle the large responses and dump them into a queue and process on __main__.

like image 280
pstatix Avatar asked Dec 18 '25 22:12

pstatix


1 Answers

From what I can see in the BytesIO type's source code, the GIL is not released during a call to BytesIO.write, since it's just doing a quick memory copy. It's only for system calls that may block that it makes sense for the GIL to be released.

There probably is such a syscall in the __next__ method of the r.iter_content generator (when data is read from a socket), but there's none on the writing side.

But I think your question reflects an incorrect understanding of what it means for a builtin function to release the GIL when doing a blocking operation. It will release the GIL just before it does the potentially blocking syscall. But it will reacquire the GIL it before it returns to Python code. So it doesn't matter how many such GIL releasing operations you have in a loop, all the Python code involved will be run with the GIL held. The GIL is never released by one operation and reclaimed by different one. It's both released and reclaimed for each operation, as a single self-contained step.

As an example, you can look at the C code that implements writing to a file descriptor. The macro Py_BEGIN_ALLOW_THREADS releases the GIL. A few lines later, Py_END_ALLOW_THREADS reacquires the GIL. No Python level runs in between those steps, only a few low-level C assignments regarding errno, and the write syscall that might block, waiting on the disk.

like image 62
Blckknght Avatar answered Dec 21 '25 12:12

Blckknght