I have an interesting scenario that is, I believe, an excellent application of an IMDB (such as H2) and, possibly, jOOQ. However, there are some interesting challenges and questions that arise.
We’ve developed a specialized, Java-based ETL platform for insurance data conversion that is now in its fourth generation. Without going into unnecessary detail, we routinely extract data from source systems such as SQL Server, DB2, etc. that are normalized to varying degrees. Insurance data conversion has two characteristics that are highly relevant here:
We typically convert one insurance entity (i.e. policy, application, claim, etc.) at a time (unless it’s part of a package or other transactional grouping, in which case we might be converting a few entities at a time). Importantly, therefore, a given conversion transaction seldom involves even 1 Mb of data at a time. Indeed, a typical transaction involves less than 50K of data—miniscule by any modern measure.
Because source and target systems can differ so dramatically in their schemas, granularity, and even underlying semantics, the transformations can be very complex. In terms of source processing, the queries are numerous and complex, frequently joining many tables, using subqueries, etc. Given this fact, obtaining reasonable performance means saving the query results in some fashion. Until now, we’ve relied on a proprietary approach involving “insurance maps,” which are specialized Java maps. We knew this approach was ultimately insufficient, but it served our needs initially.
Now that I have some time to reflect, I’m thinking about a long term approach. If we just consider the basic characteristics above, it would be seem that an IMDB like H2 would be perfect:
Execute all the complex queries against the source database (e.g. SQL Server) up-front, creating tables, performing inserts/updates, in order to create an IMDB representation of all the data that pertains to a single conversion transaction (e.g. a single insurance policy). BTW, I could see how jOOQ could be really helpful here (and elsewhere) for simlifying and increasing the type safety of these queries.
Execute all the complex transformation queries against the IMDB. Again, jOOQ might have significant benefits.
Discard and recreate the IMDB for each insurance conversion transaction.
One of the things that I love about this approach (at least with H2) is the ability to encapsulate queries in Java-based stored procedures—much better than writing T-SQL stored procs. And would it again make things even easier/safer to use jOOQ against the IMDB instead of, for example, the native H2 stored proc API?
However, I have two concerns:
I apologize for the length of this post, but this is a critically important question for our team. Thank you so much, in advance, for your responses.
A bit off topic... One thing to remember is that H2 is non-distributed database and a rather primitive solution thus at best. Essentially, this is a what-ever-fits-in-on-heap-of-a-single-JVM database. There are better approaches unless you are talking about absolutely simplistic use case (which I don't think you are).
GridGain's In-Memory Database, for example, uses H2 for its SQL processing internally (with all its benefits) but also provides full distribution for SQL as well as host of other features. There are other distributed in-memory databases and even some sophisticated data grids that can fit your use case.
Just my 2 cents here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With