I'm a novice python hobbyist and have started experimenting with multi-threading using concurrent.futures.
Each individual thread is supposed to analyse an HTML file and then append certain items to a list. Once all threads have finished, the resulting list is then written to a CSV file.
The surprising result is that certain parts of a row seem to be offset by 1 row in the list, e.g.:
Expected result:
caseList = [
   [a1, a2, a3],
   [b1, b2, b3],
   [c1, c2, c3],
   [d1, d2, d3],
]
Actual result:
caseList = [
   [a1, a2, a3],
   [b1, a2, a3],
   [c1, b2, b3],
   [d1, c2, c3]
]
Where the letters represent exactly one HTML file that is supposed to be analysed by one thread. I can't exactly pinpoint where it changes, but it starts off correct but then certain rows partly contain items that should belong to the previous row.
I have read about race conditions and locking, but have also read comments that list.append should be thread safe. So not entirely sure what's at play here.
Here's my code:
caseList = []
with concurrent.futures.ThreadPoolExecutor() as executor:
    results = [executor.submit(searchCase, filename, pattern) for filename in logContents]
    for f in concurrent.futures.as_completed(results):
        caseList.append(f.result())
        print(f.result())
Is there anything that I am obviously doing wrong here?
The response to this question is in Avoiding race condition while using ThreadPoolExecutor
You should not expect ordered results returned from the generator The for loop:
for f in concurrent.futures.as_completed(results):
Exists to control the generator created by concurrent.futures.as_completed(results). However, results are yield as they are available. As it is an asynchronous execution, results will be yield un-ordered.
You can see this explanation in the current.future documentation here:
concurrent.futures.as_completed(fs, timeout=None)
Returns an iterator over the Future instances (possibly created by different Executor instances) given by fs that yields futures as they complete (finished or canceled futures). Any futures given by fs that are duplicated will be returned once. Any futures that completed before as_completed() is called will be yielded first. The returned iterator raises a concurrent.futures.TimeoutError if next() is called and the result isn’t available after timeout seconds from the original call to as_completed(). timeout can be an int or float. If timeout is not specified or None, there is no limit to the wait time.
Hope this helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With