Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

User defined data objects - what is the best data storage strategy?

I am building a system that allows front-end users to define their own business objects. Defining a business object involves creating data fields for that business object and then relating it to other business objects in the system - fairly straight forward stuff. My question is, what is the most efficient storage strategy?

The requirements are:

  • Must support business objects with potentially 100+ fields (of all common data types)
  • The system will eventually support hundreds of thousands of business object instances
  • Business objects sometimes display data and aggregates from their relationships with other business objects
  • Users must be able to search for business objects by their data fields (and fields from related business objects)

The two possible solutions I can envisage are:

  • Have a dynamic schema such that when a new business object type is created a new table is created for storing instances of that object. The object's fields become columns in the storage table.
  • Have a fixed schema where instance data fields are stored as rows in basically a big long table.

I can see pros and cons to both approaches:

  • the dynamic schema allows me to index search columns
  • the dynamic tables are potentially limited in width by the max column size
  • dynamic schemas rule out / cause issues with replication
  • the static schema means less or even no dynamic sql generation
  • my guess is the static schema may perform like a dog when it comes to searching across 100,000+ objects

So what is the best soution? Is there another approach I haven't thought of?

Edit: The requirement I have been given is to build a generic system capable of supporting front-end user defined business objects. There will of course be restrictions on how these objects can be constructed and related, but the requirement itself is not up for negotiation.

My client is a service provider and requires a degree of flexibility in servicing their own clients, hence the need to create business objects.

like image 220
flesh Avatar asked Dec 29 '25 00:12

flesh


1 Answers

I think your problem matches very well to a graph database like Neo4j, as it's built for the requested kind of flexibility from the beginning. It stores data as nodes and relationships/edges, and both nodes and relationships can hold arbitrary properties (in a key/value fashion). One important difference to a RDBMS is that a graph database won't need to lookup the relationships in a big long table (like in your fixed schema solution), so there should be a significant performance gain there. You can find out about language bindings for Neo4j in the wiki and read what others say about it in this stackoverflow thread. Disclaimer: I'm part of the Neo4j team.

like image 199
nawroth Avatar answered Dec 30 '25 18:12

nawroth