I'm using Postgres as the database backend but I don't think that will matter. Also, I'd like to still use sqlite3
as the local development database, so ideally any approach will work for both.
By "weighted", I mean that some items in that database table are more likely to appear than others based on a heuristic value ranging from 0
to +inf
where 0
is "never going to be picked", 1
is "as equal to be picked as any other instance, and 2
is "twice as likely to be picked than any other instance".
I've read other SO posts about pulling random instances of models, but as far as I've seen there are no ways to do this quickly involving weights.
My model:
weight
DecimalField
which can be updated at any time, even during runtime.What I'm after is a fast way to do this that is faster than the solutions I've tried or an explanation to why one of my solutions that I've tried is the fastest I can get.
I want to select "fresh" stuff from a database table but still give the chance of seeing some of the older content. If some content has been viewed too frequently or it is not well-received, it should appear less frequently. Ideally, I would be able to control how frequently this is: "ah, so this will appear 1.5 times more than other content on the site."
1 * weight / instances_count
to determine if the item should be thrown back and a random choice be made again.This seems like it's pretty slow and the random nature of the "throw back" might never terminate. In general, this is really janky and I would hate to use it. Putting it first as it is pretty "simple" and I'm most likely to dismiss it.
SELECT
all of the rows.sum_of_all_weights
-sided dice.Problem is, this algorithm seems slow for millions of rows. This is the "naive" solution.
weight
more instances with the same meta-information.A caveat with this is that only integer-based weightings are possible. Also, the performance problem is offloaded from the SELECT
to the INSERT
and DELETE
operations.
throw_back_probability
field instead of weight
.0.0
it will never be thrown back. Otherwise, roll a die and throw back if required depending on the throw_back_probability
.This ultimately solves the problem but still probably requires more database calls and is an indirect solution.
It's SO, so I'm sure there are clever annotation
-based solutions to this (or similar) that I'm overlooking. Thanks for the help in advance!
You can combine sharding with any of the approaches you list. Choose a number of shards (preferably with number of rows / number of shards significantly greater than log(number of rows) to avoid empty shards with high probability), assign each row a uniform random shard ID, and make the shard ID the first entry of the primary key so that the table is sorted by shard. To sample, choose a uniform random shard and then sample within the shard. This is inaccurate to the extent that the shard totals are unbalanced, but if the shards are large enough, then the law of large numbers will kick in. (If the shards are too large, though, then that starts to defeat the point of sharding.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With