I want to store hierarchical, two-dimensional scientific datasets in a relational database (MySQL or SQLite). Each dataset contains a table of numerical data with an arbitrary number of columns. In addition, each dataset can have one or more children of the same type associated with a given row of its table. Each dataset typically has between 1 and 100 columns and between 1 and 1.000.000 rows. The database should be able to handle many datasets (>1000) and reading/writing of data should be reasonably fast.
What would the best DB schema to store such kind of data? Is it reasonable to have a "master" table with the names, IDs and relations of individual datasets and in addition one table per dataset which contains the numerical values?
Is it reasonable to have a "master" table with the names, IDs and relations of individual datasets and in addition one table per dataset which contains the numerical values?
That's how I'd do it.
I'm not exactly sure how the 'arbitrary columns' thing is working, because data usually doesn't work like that. Regardless, it sounds like storing it as row,col,val might work nicely.
Honestly though, if you don't need to search through it (max, min, etc.), it might be better to use some kind of flat file.
An alternative setup that might be interesting is using SQLite, with a separate database file for each dataset, plus one master one.
Whatever you pick, how well it will work really depends on what you're going to do with the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With