Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DynamoDB Design Pattern

I'm currently trying to design a database pattern to store data that needs the ability to scale on demand. I'm looking at DynamoDB to complete this task. I'm not familiar with the no-sql design pattern and am having some trouble with going about the design. My data set, is to be tied to a camera system that tracks people entering and exiting a room.

My current design plan is to have a table that has the device id for the particular camera as the primary key. Every 5 minutes, the camera will send the total number into the room, the total out of the room, the group id (to track a room as a whole where there are multiple entrances/exits), and a timestamp.

My issue is that, DynamoDB seems to only want one entry for a given primary key. Whenever I want to make a new addition, It wants to overwrite my data.

I was thinking that a design such as the following may work:

DeviceID: ID
{
    GroupID: ID,
    Entries: [
        {
            In: numIN, 
            Out: numOUT, 
            TimeStamp: time
        },
        // appending on each entry to the list
    ]
}

Am I using DynamoDB inefficiently? Is there a better way to go about this? It seems like making queries, such as "how many people were in room x on day y?" would be difficult.

like image 979
Alex DeCamillo Avatar asked Nov 29 '25 20:11

Alex DeCamillo


2 Answers

Is it inefficient?

No. Your not using it inefficiently. DynamoDB is good at storing and retrieving groups of hierarchical data for a single element per request. Nesting/denormalizing your data so a singled device has an array of entries is definitely recommended by AWS since you can't do joins (An entries table and devices table) as you've designed for correctly in my opinion. https://aws.amazon.com/blogs/database/should-your-dynamodb-table-be-normalized-or-denormalized/ A drawback is you have to pull every entry for the single device and append but given you make updates every 5 mins this would seem tolerable. On a small app with low user traffic I do the same thing appending to the users list of info then putting the user back. DynamoDB is very cheap per request so if you don't have millions of requests this is worth it in my opinion.

How do I run more complex Queries?

With DynamoDB you lose query flexibility in return for it being 100% managed and cheap per-request in some instances...For more complex queries you can add Global Secondary indexes so you can run queries that involve columns other than the primary key for that table. They have their own drawback too though; you still only get 2 properties per index, a where clause of 2 columns essentially and each GS index gets its own provisioned throughput, so you pay an additional flat rate for the new index. For me the Global secondary index doesn't really help when the data you want to query against is denormalized similar to how your nesting your entries. In your case you wouldn't be able to apply the in, out, timestamp fields to a Global Secondary index because the "Entries" column is a document type. There are however other NoSQL databases that you could dump your entire device JSON object into and they would index even the nested fields...

Another Database for Complex Queries

I myself did not want to use another database because I thought I could get away with DynamoDB being my primary or only datastore but if you need to ask "Give me x where A=1 AND B=2 AND C=3" its really not possible. Trying to denormalize your data while also making it query friendly I have found to be difficult. So instead I use DynamoDB to store items and to retrieve items and AWS Elasticsearch Service to run queries across those items. So in your case I would store devices with their nested entries in both DynamoDB and elasticsearch. When I need to retrieve an individual device or entry or pull anything by Id it would come from DynamoDB. When I want to run analysis across any property then I use elasticsearch.

like image 98
Usman Mutawakil Avatar answered Dec 02 '25 03:12

Usman Mutawakil


It looks as though the best way to model this data is as a 1 to many model. In doing this, I will have the DeviceID as my partition key and the timestamp as my sort key. The remaining attributes can be added as well. Having a sort key also allows for multiple entries with the same partition key, as the hash that is sorted in the background, is a combination of the partition key and the sort key. This model makes sorting through the data based on a requested time interval much simpler.

like image 21
Alex DeCamillo Avatar answered Dec 02 '25 04:12

Alex DeCamillo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!