In replica mode each write operation to any collection in any DB, also writes to the oplog collection.
Now, when writing to multiple DBs in parallel, all these write operations also write to the oplog. My question: do these write operations require locking the oplog ? (I'm using w:1 write concern). If they do, this is kind of similar to having a global lock between all the write operations to all the different DBs, isn't it ?
I'd be happy to get any hints on this.
According to the documentation, in replication, when MongoDB writes to a collection on the primary, MongoDB also writes to the primary’s oplog, which is a special collection in the local database. Therefore, MongoDB must lock both the collection’s database and the local database. The mongod must lock both databases at the same time to keep the database consistent and ensure that write operations, even with replication, are “all-or-nothing” operations.
This means that concurrent writing to multiple database in parallel on the primary can result in global locks between all the write operations. This is not applicable to the secondary, as MongoDB does not apply writes serially to secondaries, but instead collects oplog entries in batches and then apply those batches in parallel.
Disclaimer This is all of the top off my head, so please do not crucify me if I have a mistake. However, please correct me.
Why should they?
Let's assume true parallelism of queries being applied. So, we have two queries arriving at the very same time and we'd need to decide which one to insert to the oplog first. The first one taking the lock will write first, right? Except, there is a problem. Let's assume the first query is a simple one db.collection.update({_id:"foo"},{$set:{"bar":"baz"}}) while the other query is more complicated and therefor takes longer to evaluate for correctness. So in order to prevent that, a lock had to be taken on arrival and released after the idempotent oplog entry was written.
Here is where I have to rely on my memory
However, queries aren't applied in parallel. Queries are queued and evaluated in order of arrival. The database get's locked upon the application of the queries after they ran through the query optimizer. During that lock the idempotent oplog queries are written to the oplog. Since databases are not interconnected and only one query can be applied to a database at any given time, the lock on the database is sufficient. No two data changing queries can be applied to the same database concurrently anyway, so why should a lock be set on the oplog?
Apparently, a lock is take on the local database. However, since a lock is already taken on the data, I do not see the reason why. *scratchingMyHead*
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With