So I'm trying to figure out the structure behind general use cases of a CQRS+ES architecture and one of the problems I'm having is how aggregates are represented in the event store. If we divide the events into streams, what exactly would a stream represent? In the context of a hypothetical inventory management system that tracks a collection of items, each with an ID, product code, and location, I'm having trouble visualizing the layout of the system.
From what I could gather on the internet, it could be described succinctly "one stream per aggregate." So I would have an Inventory aggregate, a single stream with ItemAdded, ItemPulled, ItemRestocked, etc. events each with serialized data containing the Item ID, quantity changed, location, etc. The aggregate root would contain a collection of InventoryItem objects (each with their respective quantity, product codes, location, etc.) That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
Another method would be to have one stream per InventoryItem tracking all events pertaining to only item. Each stream is named with the ID of that item. That seems like the simpler route, but now how would you enforce domain rules like ensuring product codes are unique or you're not putting multiple items into the same location? It seems like you would now have to bring in a Read model, but isn't the whole point to keep commands and query's seperate? It just feels wrong.
So my question is 'which is correct?' Partially both? Neither? Like most things, the more I learn, the more I learn that I don't know...
An aggregate is a cluster of associated objects that we treat as a unit for the purpose of data changes. Each aggregate has a root and a boundary. The boundary defines what is inside the aggregate. The root is a single, specific entity contained in the aggregate.
With event streaming, there's a continuous stream of data flowing between systems, with the data representing a new state of events broadcast using the pub-sub pattern. Event sourcing, on the other hand, stores every new event in an append-log.
Event sourcing has several benefits: It solves one of the key problems in implementing an event-driven architecture and makes it possible to reliably publish events whenever state changes. Because it persists events rather than domain objects, it mostly avoids the object‑relational impedance mismatch problem.
CQRS is implemented by a separation of responsibilities between commands and queries, and event sourcing is implemented by using the sequence of events to track changes in data.
In a typical event store, each event stream is an isolated transaction boundary. Any time you change the model you lock the stream, append new events, and release the lock. (In designs that use optimistic concurrency, the boundaries are the same, but the "locking" mechanism is slightly different).
You will almost certainly want to ensure that any aggregate is enclosed within a single stream -- sharing an aggregate between two streams is analogous to sharing an aggregate across two databases.
A single stream can be dedicated to a single aggregate, to a collection of aggregates, or even to the entire model. Aggregates that are part of the same stream can be changed in the same transaction -- huzzah! -- at the cost of some contention and a bit of extra work to do when loading an aggregate from the stream.
The most commonly discussed design assigns each logical stream to a single aggregate.
That seems like it would allow for easily enforcing domain rules, but I see one major flaw to this; when applying those events to the aggregate root, you would have to first rebuild that collection of InventoryItem. Even with snapshotting, that seems be very inefficient with a large number of items.
There are a couple of possibilities; in some models, especially those with a strong temporal component, it makes sense to model some "entities" as a time series of aggregates.  For example, in a scheduling system, rather than Bobs Calendar you might instead have Bobs March Calendar, Bobs April Calendar and so on.  Chopping the life cycle into smaller installments can keep the event count in check.
Another possibility is snapshots, with an additional trick to it: each snapshot is annotated with metadata that describes where in the stream the snapshot was made, and you simply read the stream forward from that point.
This, of course, depends on having an implementation of an event stream that supports random access, or an implementation of stream that allows you to read last in first out.
Keep in mind that both of these are really performance optimizations, and the first rule of optimization is... don't.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With