Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomness in TensorFlow Dataset map function

I have a Dataset of raw timeseries data that I have stored in TFRecords on disk:

dataset = TFRecordDataset(tfrecords)  # tfrecords is a list of filenames
dataset = dataset.map(lambda x: do_something(x))
dataset = dataset.shuffle(1024)
dataset = dataset.repeat()
dataset = dataset.batch(128)        

What I would like my do_something function to do is, for each raw instance, take a random slice of the data so that I have a small window of data from the instance. But on the next epoch, I would like to ensure that I get a different random slice from each instance. My main question is, if introduce randomness into the map function (i.e. my do_something function), will it:

  1. Just take random slices once from each raw instance and then continue to iterate over those same slices on each epoch.
  2. Give me different random slices from each of the raw instances on each epoch.

I desire (2), so if that is not happening, is there an alternative way to achieve it?

For example, say I have 100 initial samples, each a timeseries of 50 data points. I want to generate 2000 samples of smaller slices, say 5-data-point slices. If I randomly select slices in my map function, will I just get the same 100 5-data-point slices on every repeat, or is there a way that I can get 100 different 5-data-point slices everytime I cycle through the 100 (50-data-point) initial samples?

like image 289
adamconkey Avatar asked Sep 05 '25 18:09

adamconkey


1 Answers

You will get different random slices each epoch. Each epoch will call your map function again, so as long as your map function generates different slices each time it's called, you will get different slices.

like image 71
AAudibert Avatar answered Sep 08 '25 08:09

AAudibert