Dealing with Relational Data
With the basic structure of DynamoDB in mind, how do we go about building an application that has related data that would normally be split up in a relational database? For this example, imagine we have a software consultancy firm with a number of employees working on projects for clients. In addition, we want to track the technology used within these projects. Because everyone loves a good entity relationship diagram, the database may be structured like this:
Approaching the Problem
When using a relational database, these entities would be normalized into different tables, and we would use SQL queries to combine them dynamically into views our application can use. DynamoDB tries to reduce the computational overhead of such queries, and instead tries to store data in the format it would be consumed. In fact, according to AWS a well-designed application requires only one table! How can this be so?
If we tried to use Dynamo like RDBMS
You may imagine that we would create an individual table for clients, projects, employees, and technologies, however, this pattern doesn’t leverage DynamoDB in the most effective way. If we have a scenario where we need to delete a client, we would want their projects deleted as well. If we used multiple tables, our process would look like:
Find all client projects
Delete all projects
Delete the client
Problems:
Multiple API calls
Very error-prone as code gets refactored and extended
We lose all advantages of working with a NoSQL solution and have to deal with all the disadvantages
New Application Requirements
Let’s rethink the way our application is structured to leverage a NoSQL paradigm instead. If we use partition keys, sort keys, and indexes effectively, we can model that relational data in a way that allows complex queries, deletions, and insertions. Our new table may look like:
Note how the partition key is the same for both projects and clients, but the sort key is different. This allows you to set up relations to different items by querying on PK for the equivalent of a join. This pattern of design is called the Adjacency List Design Pattern, and you can read more about it in the DynamoDB context here.
Review
Modeling Relationships
Just like relational databases, there are guidelines for effectively dealing with one-to-one, many-to-one, and many-to-many data.
We store related items close together using partition keys.
Taking advantage of Sort Keys
The sort key lets us control the granularity of our data and the order in which it’s returned.
Partition keys often tell us what “cabinet” or “bucket” our item belongs to.
Sort keys are like folders in the cabinet, containing files.
Access Patterns
Unlike relational databases, we need to know what kind of queries we’re going to execute in advance to understand how to store our data.
We design our keys and any additional indexes based on the business questions we expect to ask most frequently.
Conclusions
Relational database systems (RDBMS) and DynamoDB have different strengths and weaknesses.
RDBMS
Pros
Ensures schema and relationships in data
Easier to maintain transactions
Data can be queried flexibly and store data efficiently
Cons
Must maintain schemas during development and migrate changes
Queries are relatively expensive and don’t scale well in high-traffic situations (see First Steps for Modeling Relational Data in DynamoDB)
DynamoDB
Pros
Highly available and fast
Scales automatically
Schemaless so easier to change
Cons
Need to rethink normalization and consistency
Must understand the queries and materialized views when designing
Can only sort on sort keys or secondary indexes
As with any solution, there is no silver bullet. There are many ways to solve the problems at hand, and we try to find the solution that fits that need the best. For more information on DynamoDB and best practices, please visit: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html
Comments