Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing Scientific Data in a Relational Database

I want to store hierarchical, two-dimensional scientific datasets in a relational database (MySQL or SQLite). Each dataset contains a table of numerical data with an arbitrary number of columns. In addition, each dataset can have one or more children of the same type associated with a given row of its table. Each dataset typically has between 1 and 100 columns and between 1 and 1.000.000 rows. The database should be able to handle many datasets (>1000) and reading/writing of data should be reasonably fast.

What would the best DB schema to store such kind of data? Is it reasonable to have a "master" table with the names, IDs and relations of individual datasets and in addition one table per dataset which contains the numerical values?

like image 561
ThePhysicist Avatar asked Sep 07 '25 12:09

ThePhysicist


1 Answers

Is it reasonable to have a "master" table with the names, IDs and relations of individual datasets and in addition one table per dataset which contains the numerical values?

That's how I'd do it.

I'm not exactly sure how the 'arbitrary columns' thing is working, because data usually doesn't work like that. Regardless, it sounds like storing it as row,col,val might work nicely.

Honestly though, if you don't need to search through it (max, min, etc.), it might be better to use some kind of flat file.

An alternative setup that might be interesting is using SQLite, with a separate database file for each dataset, plus one master one.

Whatever you pick, how well it will work really depends on what you're going to do with the data.

like image 114
zebediah49 Avatar answered Sep 10 '25 01:09

zebediah49