I have a Dataset of raw timeseries data that I have stored in TFRecords on disk:
dataset = TFRecordDataset(tfrecords) # tfrecords is a list of filenames
dataset = dataset.map(lambda x: do_something(x))
dataset = dataset.shuffle(1024)
dataset = dataset.repeat()
dataset = dataset.batch(128)
What I would like my do_something
function to do is, for each raw instance, take a random slice of the data so that I have a small window of data from the instance. But on the next epoch, I would like to ensure that I get a different random slice from each instance. My main question is, if introduce randomness into the map
function (i.e. my do_something
function), will it:
I desire (2), so if that is not happening, is there an alternative way to achieve it?
For example, say I have 100 initial samples, each a timeseries of 50 data points. I want to generate 2000 samples of smaller slices, say 5-data-point slices. If I randomly select slices in my map
function, will I just get the same 100 5-data-point slices on every repeat
, or is there a way that I can get 100 different 5-data-point slices everytime I cycle through the 100 (50-data-point) initial samples?
You will get different random slices each epoch. Each epoch will call your map
function again, so as long as your map
function generates different slices each time it's called, you will get different slices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With