Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB python bindings an order of magnitude slower than java?

I asked this same question on the mongodb-user list: http://groups.google.com/group/mongodb-user/browse_thread/thread/b3470d6a867cd24

I was hoping someone on this forum might have some insight...

I've run a simple experiment comparing the performance of cursor iteration using python vs. java and have found that the python implementation is about 10x slower. I was hoping someone could tell me if this difference is expected or if I'm doing something clearly inefficient on the python side.

The benchmark is simple: it performs a query, iterates over the cursor, and inspects the same field in each document. In the python version, I can inspect about 22k documents per second. In the java version, I can inspect about 220k documents per second.

I've seen a few similar questions about python performance and I've taken the advice and made sure I'm using the C extensions:

>>> import pymongo 
>>> pymongo.has_c() 
True 
>>> import bson 
>>> bson.has_c() 
True 

Finally, I don't believe the discrepancy is due to fundamental differences between python and java, at least at the level my test code. For example, if I store the queried documents in a python list, I can iterate over that list very quickly. In other words, it's not an inefficient python for-loop that accounts for the difference. Furthermore, I get almost identical performance Java vs. Python when inserting documents.

Here are a few more details about the query:

  • Both the python and java implementations use the same query on the same collection and run on the same machine.
  • The collection contains about 20 million documents.
  • The query returns about 2 million documents, i.e., I'm retrieving about 10% of the collection.
  • Each document contains three simple fields: a date and two strings.
  • The query is indexed and the time spent in the actual query is negligible for both the python and java implementations.It's the cursor iteration that accounts for the runtime.
like image 690
Sam Avatar asked Sep 10 '25 23:09

Sam


1 Answers

Well looking at your post on Google Groups as well, here's my 2c:

  1. Python is slower than Java. Since Python is not typed, it's interpreter cannot do all the Java JIT "magic" and so it will always be slower at runtime.

  2. On the Google Groups thread it is stated that:

"The big surprise in the results is how the Python benchmark performance degrades when I insert shorter values. If anything, I would have expected the opposite. Comparatively, the Java numbers are essentially the same for long vs. short strings".

This can be misleading due to Mongo's asynchronous behaviour when it comes to writes. Make sure you set the same Write Concern when you fire those writes in both your Java and Python benchmarks (and preferably set it to SAFE_MODE). In other words, if you don't specifically set any Write Concern, make sure the driver's default value is the same in both Python and Java variants.

like image 195
Shivan Dragon Avatar answered Sep 13 '25 04:09

Shivan Dragon