We have a scenario where we need to store multiple feeds under a site model as following:
{
id: site_id
name: site_name
feeds: [
{
url: feed_url_1
date: feed_update_date_1
},
{
url: feed_url_2
date: feed_update_date_2
},
...
]
}
Since feeds
is an array, we can update it with $set
, $push
or $addToSet
.
2 different race condition (write skew) may occur when our concurrent application (queue) try to update the same site model.
If we pick $set
, and guard duplicate on client side, then if 2 queues are writing to the same site, one feed maybe lost with following sequence.
Given a wordpress site, extract 2 feeds (RSS and ATOM), dispatch to Q1 and Q2.
Q1: load existing feed, check RSS feed is new
Q2: load existing feed, check ATOM feed is new
Q1: $set feeds => [RSS]
Q2: $set feeds => [ATOM]
Now RSS feed is lost.
If we pick $push
or $addToSet
, then following may happen.
User A added a site, putting RSS feed to Q1
User B added the same site, putting the same RSS feed to Q2
Q1: load existing feed, check RSS feed is new
Q2: load existing feed, check RSS feed is new
Q1: $push RSS
Q2: $push RSS
Now RSS feed has been duplicated
If our data model were simply { url }
, then $addToSet
will safeguard against duplicate feed. But unfortunately this is not the case, the date
attribute may differ. So $addToSet
is not much safer than $push
.
We have thought of a few possible workaround to this problem, but none are great given our tight schedule.
Decouple feeds from site into its own collection, safeguard with url
alone, and change our model and repository accordingly.
Insert a partial { url }
into the site model first, then update them with addition information, this should makes $addToSet
usable, but may break other queue that require date
to always be present (testing needed).
Let race condition happen as-is, $push
the feed first, use a background queue to detect duplicate and remove them later.
(There might be a 4th solution if upsert work with positional query, but as far as I know MongoDB v2.4 doesn't have it yet)
So I wonder whether there are better alternative for resolving this kind of race condition. Or if there are some best practices for it.
you might want to have a look at tokumx, a fork of mongodb which supports transactions (besides a few other usefull things)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With