I have a program that runs in AWS and reads over 400k documents in a DB. It ran flawlessly until recently. I'm not sure what change but now I'm getting the pymongo.errors.CursorNotFound: cursor id "..." not found
I tried researching and it seems to be a connection issue to the DB, but I have not changed anything.
Below is the stack trace:
Text Analysis Started....
DB Connection init...
Traceback (most recent call last):
File "predict.py", line 8, in <module>
textanalyser.start()
File "/usr/src/app/text_analyser.py", line 100, in start
for row in table_data:
File "/usr/local/lib/python3.7/site-packages/pymongo/cursor.py", line 1156, in next
if len(self.__data) or self._refresh():
File "/usr/local/lib/python3.7/site-packages/pymongo/cursor.py", line 1093, in _refresh
self.__send_message(g)
File "/usr/local/lib/python3.7/site-packages/pymongo/cursor.py", line 955, in __send_message
address=self.__address)
File "/usr/local/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1346, in _run_operation_with_response
exhaust=exhaust)
File "/usr/local/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1464, in _retryable_read
return func(session, server, sock_info, slave_ok)
File "/usr/local/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1340, in _cmd
unpack_res)
File "/usr/local/lib/python3.7/site-packages/pymongo/server.py", line 136, in run_operation_with_response
_check_command_response(first)
File "/usr/local/lib/python3.7/site-packages/pymongo/helpers.py", line 156, in _check_command_response
raise CursorNotFound(errmsg, code, response)
pymongo.errors.CursorNotFound: cursor id 3011673819761463104 not found
Any help you can provide would be greatly appreciated.
This is a very common issue in MongoDB. I will elaborate on the issue first then provide possible workarounds for you.
Whenever you perform a find or Aggregate operation on MongoDB, it returns a cursor to you which will have a unique cursor id assigned to it. This cursor will have a deadline where it will delete after few minutes of inactivity. This is done so to save the memory and CPU usage of the machine running MongoDB. The maximum document returned from a cursor is 16MB or the value set in the MongoDB config file.
Let's assume you perform a find operation with 1000 records in a batch of 100 in a MongoDB server with 10 min cursor idle timeout configured. If the processing of 300 - 400 documents takes more than 10 minutes, that cursor is terminated and you won't be able to get 400 - 500 batch documents since it is not able to match that id.
There are few workarounds though.
Workaround - 1:
You can set the no cursor timeout option no_cursor_timeout=True for find commands.
Note: Don't forget to terminate the cursor in the end
cursor = col.find({}, no_cursor_timeout=True)
for x in cursor:
print(x)
cursor.close() # <- Don't forget to close the cursor
Workaround - 2:
Additionally, limit the batch size to a lesser number batch_size=1
What this does is send documents in a batch of 10 overwriting the default.
cursor = col.find({}, no_cursor_timeout=True, batch_size=1)
for x in cursor:
print(x)
cursor.close() # <- Don't forget to close the cursor
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With