I am using the $sample query for mongo aggregation. in the following manner:
db.col.aggregate([
{$match: {topic: topic}},
{$sample: {'size': 10}}
{$project: {_id: 1}}
])
My question is, is there a way to set the 'seed' for the sampling, so that every time I run this command I get the same result ?
For example, in python I do it like the following:
import random
list_of_items = [...]
# set the seed to 0
random.seed(0)
# get sample
samples = random.sample(list_of_items, 10)
By manually defining the seed, I make sure that the result is the same every time I do this operation.
One of the workarounds we used for similar issues is we use $out
after $sample
to create a 'snapshot' collection. We then work on the 'snapshot' collection to perform experiments with reproducible behaviors.
Another advantage we gained is we can perform indexing on the 'snapshot' collection to speed up our experiments per our need.
You may do a workaround until the mongodb team implement this feature.
You can assign a random id [0; 1] to your documents and sort+limit them by this id.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With