I asked this same question on the mongodb-user list: http://groups.google.com/group/mongodb-user/browse_thread/thread/b3470d6a867cd24
I was hoping someone on this forum might have some insight...
I've run a simple experiment comparing the performance of cursor iteration using python vs. java and have found that the python implementation is about 10x slower. I was hoping someone could tell me if this difference is expected or if I'm doing something clearly inefficient on the python side.
The benchmark is simple: it performs a query, iterates over the cursor, and inspects the same field in each document. In the python version, I can inspect about 22k documents per second. In the java version, I can inspect about 220k documents per second.
I've seen a few similar questions about python performance and I've taken the advice and made sure I'm using the C extensions:
>>> import pymongo
>>> pymongo.has_c()
True
>>> import bson
>>> bson.has_c()
True
Finally, I don't believe the discrepancy is due to fundamental differences between python and java, at least at the level my test code. For example, if I store the queried documents in a python list, I can iterate over that list very quickly. In other words, it's not an inefficient python for-loop that accounts for the difference. Furthermore, I get almost identical performance Java vs. Python when inserting documents.
Here are a few more details about the query:
Well looking at your post on Google Groups as well, here's my 2c:
Python is slower than Java. Since Python is not typed, it's interpreter cannot do all the Java JIT "magic" and so it will always be slower at runtime.
On the Google Groups thread it is stated that:
"The big surprise in the results is how the Python benchmark performance degrades when I insert shorter values. If anything, I would have expected the opposite. Comparatively, the Java numbers are essentially the same for long vs. short strings".
This can be misleading due to Mongo's asynchronous behaviour when it comes to writes. Make sure you set the same Write Concern when you fire those writes in both your Java and Python benchmarks (and preferably set it to SAFE_MODE). In other words, if you don't specifically set any Write Concern, make sure the driver's default value is the same in both Python and Java variants.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With