Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete h5py dataset's item, but file size double

I want to manipulate one of the old items of h5py dataset, then delete the old one and add the new one.

I use __delitem__() function to delete the old dataset item. It seems successfully delete that item from the keys of f5py file. But the file size doubles. Can any one give advice to actually delete items of h5py dataset? Thanks a lot.

This is my code:

import numpy as np
import h5py

# suppose I have hdf5 file names stored in: h5_files

for name in h5_files:
    roll_images = []
    with h5py.File(name, "a") as f:
        x = f["x_data"]
        np_x = np.array(x)

        # do something to np_x, but keep dtype and shape the same as x.

        f.__delitem__("x_data")
        f.create_dataset("x_data", data = np_x)

The size of original h5py file is: 997.3MB. But the after running the above code, file size is about double: 2.0GB

like image 455
Dong Li Avatar asked Sep 10 '25 13:09

Dong Li


1 Answers

I might be wrong but I think that dataset deletion actually only removes name of the dataset but data still remains in the file. That would explain doubling of the file size.

If you really need to "delete" a dataset, copy all but the dataset to a new hdf5 file. I remember that this was the only work-around I was able to find in order to achieve the same thing.

Note: instead of f.__delitem__("x_data") you can use del f["x_data"].

like image 71
ziky Avatar answered Sep 13 '25 06:09

ziky