How to structure/implement multidimensional data / data cube

Question

I've been reading into what a data cube is and there are lots of resources saying what it is and why (OLAP/ business intelligence / aggregations on specific columns) you would use one but never how.

Most of the resources seem to be referencing relational data stores but it doesn't seem like you have to use an RDBMS.

But nothing seems to show how you structure the schema and how to query efficiently to avoid the slow run time of aggregating on all of this data. The best I could find was this edx class that is "not currently available": Developing a Multidimensional Data Model.

Vitaliy Fedorchenko · Accepted Answer

You probably already know that there are 2 different OLAP approaches:

MOLAP that requires data load step to process possible aggregations (previously defined as 'cubes'). Internally MOLAP-based solution pre-calculates measures for possible aggregations, and as result it is able to execute OLAP queries very fast. Most important drawbacks of this approach come from the fact that MOLAP acts as a cache: you need to re-load input data to refresh a cube (this can take a lot of time - say, hours), and full reprocessing is needed if you decide to add new dimensions/measures to your cubes. Also, there is a natural limit of the dataset size + cube configuration.
ROLAP doesn't try to pre-process input data; instead of that it translates OLAP query to database aggregate query to calculate values on-the-fly. "R" means relational, but approach can be used even with NoSQL databases that support aggregate queries (say, MongoDb). Since there is no any data cache users always get actual data (on contrast with MOLAP), but DB should able to execute aggregate queries rather fast. For relatively small datasets usual OLTP databases could work fine (SQL Server, PostgreSql, MySql etc), but in case of large datasets specialized DB engines (like Amazon Redshift) are used; they support efficient distributed usage scenario and able to processes many TB in seconds.

Nowadays it is a little sense to develop MOLAP solution; this approach was actual >10 years ago when servers were limited by small amount of RAM and SQL database on HDD wasn't able to process GROUP BY queries fast enough - and MOLAP was only way to get really 'online analytical processing'. Currently we have very fast NVMe SSD, and servers could have hundreds gigabytes of RAM and tens of CPU cores, so for relatively small database (up to TB or a bit more) usual OLTP databases could work as ROLAP backend fast enough (execute queries in seconds); in case of really big data MOLAP is almost unusable in any way, and specialized distributed database should be used in any way.

Rich · Answer

The general wisdom is that cubes work best when they are based on a 'dimensional model' AKA a star schema that is often (but not always) implemented in an RDBMS. This would make sense as these models are designed to be fast for querying and aggregating.

Most cubes do the aggregations themselves in advance of the user interacting with them, so from the user perspective the aggregation/query time of the cube itself is more interesting than the structure of the source tables. However, some cube technologies are nothing more than a 'semantic layer' that passes through queries to the underlying database, and these are known as ROLAP. In those cases, the underlying data structure becomes more important.

The data interface presented to the user of the cube should be simple from their perspective, which would often rule out non-dimensional models such as basing a cube directly on an OLTP system's database structure.

How to structure/implement multidimensional data / data cube

Tags:

database

aggregate

business-intelligence

olap-cube

cube

diplosaurus

2 Answers

Vitaliy Fedorchenko

Rich

Recent Activity

Donate For Us

How to structure/implement multidimensional data / data cube

Tags:

database

aggregate

business-intelligence

olap-cube

cube

diplosaurus

2 Answers

Vitaliy Fedorchenko

Rich

Related questions

Recent Activity

Donate For Us