Introduction to DynamoDB and Modeling Relational Data (PART 2)
Updated: Sep 15, 2020
Getting started with DynamoDB
AWS makes it very easy to get started with DynamoDB. Simply log into your AWS account and navigate to DynamoDB > Tables.
Click Create Table
Enter the table name and the name of the partition key and an optional sort key if you wish (more on that later)
Congratulations! You have created your first DynamoDB table. No EC2 provisioning, ports to configure, or schemas to set up. You are ready to start inserting data into your new table.
Basic anatomy of a DynamoDB Table
DynamoDB is a NoSQL or non-relational database and handles things very differently. It would fit more closely into the category of a key/value store rather than a document database like MongoDB. While DynamoDB is able to store items in a familiar object structure, the way that data is accessed is very different, so designing your keys is an important first step.
Partition / Hash Keys
A partition refers to the physical location in memory where an item is stored. That means related items are stored close together. A great way to visualize a partition key is like a filing cabinet drawer. When looking for your document, the partition key lets you know which drawer to look into. For simple data, the partition key may represent the primary key, but for more complex and related data, the partition key will help group related data together.
Sort / Range Keys
A sort key is used to sort the data within a given table as well as provide uniqueness as a composite key combined with the partition key. A great way to visualize a sort key is to imagine the alphabetical folders within our file cabinets. The documents are sorted by alphabetical order within the drawer.
How are they used?
When making a query, the partition key is used to identify the exact node the data resides on before filtering the rest of the query. This has powerful implications. As your data grows, if your partitions are well distributed, it means your requests will be just as fast regardless of how much data is in your table. A query can be made on the partition key by itself, or you may provide a sort key to help further limit the items returned. The sort key serves two purposes: sorting and searching. You can do a partial search on a sort key, unlike a partition key which must be a full match. The sort key can be used to sort your items by ascending or descending order. One limitation of DynamoDB is that the sort key is one of the only ways to sort your data. You cannot sort on every attribute. More on that with indexes. In addition, you can only have one partition key and one sort key per base table.
Scan vs Query
A query takes a very targeted approach to identify the partition the data resides on before conducting a filter. Queries are the preferred method for accessing your data because they are quicker, and they use less of your RCUs. However, not all searches can be conducted with a query. If you don’t know the partition key and want to search for a specific attribute value, you will have to use a scan. Scans crawl over each record in your table and collect all items that match the filter expression. This is an expensive operation and should be used sparingly.
Approaching the Query Problem
We may have different scenarios where you need to sort on additional fields, submit queries with different partition keys, or allow for searches on other fields. How do we handle those scenarios with the limitations mentioned above? Indexes, of course!
A Tale of Two Indexes
We can have our cake and eat it too by defining a new index. These indexes are a lot like an index in a book. There is a map of the contents, so we can find things quickly. Let’s imagine a table where we stored books, the employees who checked them out, and when.
1. Global Secondary Indexes
Global secondary indexes allow you to define a new partition key and a new sort key. When making a query, you select the index you wish to use along with the partition key and sort key you want to query on. GSIs create a copy of the base table, and this copy is maintained seamlessly in the background. You can create GSIs after the table has been created, but you are limited to 5 GSIs on a table. There are some patterns you can use to optimize your GSIs such as overloading.
2. Local Secondary Indexes
Local secondary indexes allow you to create additional sort keys on the base table. They use the same partition key as the base table but provide additional attributes to sort or search on. Unlike a GSI, LSIs can only be created when the table is created. The sort key of the base table is projected into the index, where it acts as a non-key attribute. Because LSIs use the base table’s partition, they are limited to 10GB per partition key request. Like GSIs, you can only have 5 LSIs per table.