Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding the TimeSeriesDataSet in pytorch forecasting

Here is a code sample taken from one of pytorch forecasting tutorila:

# create dataset and dataloaders
max_encoder_length = 60
max_prediction_length = 20

training_cutoff = data["time_idx"].max() - max_prediction_length

context_length = max_encoder_length
prediction_length = max_prediction_length

training = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx="time_idx",
    target="value",
    categorical_encoders={"series": NaNLabelEncoder().fit(data.series)},
    group_ids=["series"],
    # only unknown variable is "value" - and N-Beats can also not take any additional variables
    time_varying_unknown_reals=["value"],
    max_encoder_length=context_length,
    max_prediction_length=prediction_length,
)

validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff + 1)
batch_size = 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=0)

I don't really understand how the validation dataset is done with respect to the time index. I also don't understand why there is no test dataset in the tutorial. is it for a specific reason?

like image 299
lalaland Avatar asked Oct 28 '25 03:10

lalaland


1 Answers

Concerning validation dataset:

training dataset is all data except the last max_prediction_length data points of each Time series (each time series correspond to datapoints with same group_ids). Those last datapoints are filtered by the training cutoff (cutoff is the same for each time series because they are of same size)

validation data is the last max_prediction_length data points use as targets for each time series (which mean validation data are the last encoder_length + max_prediction_length of each time series). This is done by using parameter min_prediction_idx=training_cutoff + 1 which make the dataset taking only data with time_index with value superior to training_cutoff + 1 (minimal decoder index is always >= min_prediction_idx)

like image 148
ThomaS Avatar answered Oct 30 '25 02:10

ThomaS



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!