I've been reading into what a data cube is and there are lots of resources saying what it is and why (OLAP/ business intelligence / aggregations on specific columns) you would use one but never how.
Most of the resources seem to be referencing relational data stores but it doesn't seem like you have to use an RDBMS.
But nothing seems to show how you structure the schema and how to query efficiently to avoid the slow run time of aggregating on all of this data. The best I could find was this edx class that is "not currently available": Developing a Multidimensional Data Model.
You probably already know that there are 2 different OLAP approaches:
Nowadays it is a little sense to develop MOLAP solution; this approach was actual >10 years ago when servers were limited by small amount of RAM and SQL database on HDD wasn't able to process GROUP BY queries fast enough - and MOLAP was only way to get really 'online analytical processing'. Currently we have very fast NVMe SSD, and servers could have hundreds gigabytes of RAM and tens of CPU cores, so for relatively small database (up to TB or a bit more) usual OLTP databases could work as ROLAP backend fast enough (execute queries in seconds); in case of really big data MOLAP is almost unusable in any way, and specialized distributed database should be used in any way.
The general wisdom is that cubes work best when they are based on a 'dimensional model' AKA a star schema that is often (but not always) implemented in an RDBMS. This would make sense as these models are designed to be fast for querying and aggregating.
Most cubes do the aggregations themselves in advance of the user interacting with them, so from the user perspective the aggregation/query time of the cube itself is more interesting than the structure of the source tables. However, some cube technologies are nothing more than a 'semantic layer' that passes through queries to the underlying database, and these are known as ROLAP. In those cases, the underlying data structure becomes more important.
The data interface presented to the user of the cube should be simple from their perspective, which would often rule out non-dimensional models such as basing a cube directly on an OLTP system's database structure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With