How to write ndarray to .npy file iteratively with batches

Question

I am generating large dataset for a machine learning application, which is a numpy array with shape (N,X,Y). Here N is the number of samples, X is the input of a sample and Y is the target of a sample. I want to save this array in the .npy format. I have many samples (N is very large) so that the final dataset is about 10+ GB. This means that I cannot create the whole dataset and then save it, as it will flood my memory.

Is it possible to instead to write batches of n samples iteratively to this file? So, I want to append for example batches of 256 samples to the file at once ((256,X,Y)).

Stef · Accepted Answer

Here is a solution based on numpy's implementaion of save to write a standard npy file including shape and type information:

import numpy as np
import numpy.lib as npl

a = np.random.random((30, 3, 2))
a1 = a[:10]
a2 = a[10:]

filename = 'out.npy'
with open(filename, 'wb+') as f:
    header = npl.format.header_data_from_array_1_0(a1)
    npl.format.write_array_header_1_0(f, header)
    a1.tofile(f)
    a2.tofile(f)
    f.seek(0)
    header['shape'] = (len(a1) + len(a2), *header['shape'][1:])
    npl.format.write_array_header_1_0(f, header)

assert (np.load(filename) == a).all()

This works for C_CONTIGUOUS arrays without Python objects.

How to write ndarray to .npy file iteratively with batches

Tags:

python

memory-management

numpy

Thomas Wagenaar

1 Answers

Stef

Recent Activity

Donate For Us

How to write ndarray to .npy file iteratively with batches

Tags:

python

memory-management

numpy

Thomas Wagenaar

1 Answers

Stef

Related questions

Recent Activity

Donate For Us