It is recommended to use Python multi-threading only in IO-bound tasks because Python has a global interpreter lock (GIL) that only allows one thread to hold the control of the Python interpreter. However, Does multithreading make sense for IO-bound operations? says that, in general, multithreading in disk IO-bound tasks only makes sense if you are accessing more than one disk, given that the bottleneck is the disk.
Given that, if I have several tasks that access a database in a single local disk running simultaneously, is there any advantage in using multithreading, as the bottleneck will be the disk?
Does the answer change if the database is stored in a single remote disk? I guess that possibly yes, given that there is another variable which may be the bottleneck: the round-trip time between me and the server.
CPython and Pypy both have problems with threading CPU-bound tasks. Others, like Jython and IronPython do not.
Sometimes it makes sense to use multithreading or multiprocessing with I/O bound tasks, because a disk seek is an eon to the CPU, so if you can get some CPU work out of the way while you wait for a disk response, you've done a good thing.
If you write your code to have a tunable amount of parallelism, you can experimentally deduce a good number for your workload.
If you write your code to use the new concurrent.futures API, you can (mostly) easily flip between threads and processes using the similar:
This API is available in CPython 3.2 and up, as well as Tauthon 2.8.
Here's an example program: http://stromberg.dnsalias.org/~strombrg/coordinate/
HTH.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With