I am using h5py to store experiment data in an HDF5 container.
In an interactive session I open the file using:
measurement_data = h5py.File('example.hdf5', 'a')
Then I write data to the file using some self-written functions (can be many GB of data from a couple of days experiment). At the end of the experiment I usually would close the file using
measurement_data.close()
Unfortunately, from time to time it happens, that the interactive session ends without me explicitly closing the file (accidentally killing the session, power outage, crash of OS due to some other software). This always results in a corrupt file and loss of the complete data. When I try to open it, I get the error:
OSError: Unable to open file (File signature not found)
I also cannot open the file in HDFview, or any other software I tried.
Always opening and closing the file for every write access sounds pretty unfavorable to me, because I am continuously writing data from many different functions and threads. So I'd be more happy with a different solution.
Encodings. HDF5 supports two string encodings: ASCII and UTF-8.
The h5py package is a Pythonic interface to the HDF5 binary data format. It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays.
This is probably due to your chunk layout - the more chunk sizes are small the more your HDF5 file will be bloated. Try to find an optimal balance between chunk sizes (to solve your use-case properly) and the overhead (size-wise) that they introduce in the HDF5 file.
View Metadata of an HDF5 Object To view the metadata of a data object, Right click on the object and then select 'Show Properties'. A window will open and display metadata information such as name, type, attributes, data type, and data space.
The corruption problem is known to the HDF5 designers. They are working on fixing this in version 1.10 by adding journalling. In the mean time you can call flush() periodically to make sure your writes have been flushed, which should minimise some of the damage. You can also try to use external links which will allow you to store pieces of data in separate files but link them together into one structure when you read them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With