Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manually Setting the Seed for MongoDB $sample

I am using the $sample query for mongo aggregation. in the following manner:

db.col.aggregate([
    {$match: {topic: topic}},
    {$sample: {'size': 10}}
    {$project: {_id: 1}}
])

My question is, is there a way to set the 'seed' for the sampling, so that every time I run this command I get the same result ?

For example, in python I do it like the following:

import random
list_of_items = [...]

# set the seed to 0 
random.seed(0)

# get sample 
samples = random.sample(list_of_items, 10)

By manually defining the seed, I make sure that the result is the same every time I do this operation.

like image 579
Codious-JR Avatar asked Sep 12 '25 14:09

Codious-JR


2 Answers

One of the workarounds we used for similar issues is we use $out after $sample to create a 'snapshot' collection. We then work on the 'snapshot' collection to perform experiments with reproducible behaviors.

Another advantage we gained is we can perform indexing on the 'snapshot' collection to speed up our experiments per our need.

like image 97
ray Avatar answered Sep 15 '25 13:09

ray


You may do a workaround until the mongodb team implement this feature.

You can assign a random id [0; 1] to your documents and sort+limit them by this id.

like image 20
Poyoman Avatar answered Sep 15 '25 12:09

Poyoman