I would like to store thousands to millions of tensors with different shapes to disk. The goal is to use them as a time series dataset. The dataset will probably not fit into memory and I will have to load samples or ranges of samples from disk.
What is the best way to accomplish this while keeping storage and access time low?
One way would be to do a. numpy(). save('file. npy') then converting back to a tensor after loading.
The commonly used way to store such data is in a single array that is laid out as a single, contiguous block within memory. More concretely, a 3x3x3 tensor would be stored simply as a single array of 27 values, one after the other.
Two tensors of the same size can be added together by using the + operator or the add function to get an output tensor of the same shape.
The easiest way to save anything in disk is by using pickle:
import pickle
import torch
a = torch.rand(3,4,5)
# save
with open('filename.pickle', 'wb') as handle:
    pickle.dump(a, handle)
# open
with open('filename.pickle', 'rb') as handle:
    b = pickle.load(handle)
You can also save things with pytorch directly, but that is just a pytorch wrapper around pikle.
import torch
x = torch.tensor([0, 1, 2, 3, 4])
torch.save(x, 'tensor.pt')
If you want to save multiple tensors in one file, you can wrap them in a dictionary:
import torch
x = torch.tensor([0, 1, 2, 3, 4])
a = torch.rand(2,3,4,5)
b = torch.zeros(37)
torch.save({"a": a, "b":b, "x", x}, 'tensors.pt')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With