Things to consider before choosing DynamoDB
DynamoDB is the non-relational database service provided by Amazon Web Services (AWS) as the fast , flexible and eventual consistent database solution for modern large scale applications. Being an eventually consistent database system, DynamoDB relaxes the consistency in favor of availability and partition tolerance.
Even though DynamoDB is a NoSQL database system, DynamoDB has significant differences compared to other NoSQL database systems like MongoDB or Apache CouchDB. Not as with relational databases, it’s not an easy task to migrate data from DynamoDB to another NoSQL or SQL database system after your application going into production. Therefore, DynamoDB being the only NoSQL database solution provided by AWS should not be the reason to select DynamoDB as the database for your application. Let’s dive into some important factors to consider before choosing DynamoDB as the database system for your application.
Composite primary keys can contain maximum of 2 attributes
In DynamoDB composite primary keys are defined as a combination of a partition key and a sort key. Therefore, unlike in relational databases we cannot define a composite primary key with more than 2 attributes. This leads into significant problems if you don’t think about how your data is going to be stored at the designing phase.
For example, suppose you have a Students table. To uniquely identify a student you need all 3 attributes grade, class and name. But DynamoDB doesn’t allow you to store data in this format because of the above mentioned limitation. Therefore, either you have to concatenate grade and class as one field or you have to introduce an unique id as an additional field for each student.
Can query data only with keys and indexes
In SQL you can use any column to select data in where clause while indexes make the select operation faster. But this is not the case with DynamoDB. In DynamoDB, you can only use keys (primary key or primary key with sort key - you cannot use sort key alone) and indexes to query data. If you want to search with an attribute other than a key or an index, you have to scan through all the records of the table while performing a conditional check. Anyway this scan is performed in the database level not in the application level. But still it will take a significant amount of time depending on the table size.
There are 2 types of operations to retrieve data in DynamoDB. They are scan and query. Query operation is far more similar to select operation in SQL using only indexes for the where clause. But there are some concerns worth mentioning here. In SQL you can use indexed columns for selection while using non-indexed columns for projection.
For example, let’s say name is an index of the students table and address is not an index. Then the following query doesn’t have any performance limitations.
select name, address from students table where name = “John”
The index of the name is used to find a pointer to the actual record and then the value of the address is read from the actual record.
This is because in SQL the index is used to find a direct access pointer to the actual record or the bucket of records. But in DynamoDB, creating an index will results in creating a new table. We have to define what attributes to be projected to that newly created index table. We can set the index to project all attributes in the parent table but that will increase the cost as you have to pay for the storage in DynamoDB. But if you define to project only a selected set of attributes with the index, you can retrieve only that set of attributes using that particular index. If you try to retrieve an attribute that is not projected to the index it will perform a scan in the parent table making your index useless. Therefore, you have to carefully decide what are the attributes to be indexed and what are the attribute to be projected with each index.
Limited query capabilities from AWS web console
AWS web console can be used to view the data of DynamoDB tables. But this console has very limited query capabilities making the development and testing tasks cumbersome. Some of them are as follows.
- Only 100 items are displayed once. If you want to check the 1000th item of a table, you have to press the next button 10 times until 901st to 1000th items are displayed.
- Cannot insert/update multiple data items in one operation. You have to insert/update one by one.
- Can delete multiple items only by ticking them with the checkbox. You cannot use queries/conditions to delete data in the console.
- You can export existing data of the table to a CSV file. But there is no option to import that data again. And export is also possible only as batches of 100 items.
AWS provides a local client for DynamoDB. Unfortunately, this client cannot be connected to a table hosted in AWS. It works only with a locally running DynamoDB server. Anyway this is a good option to test and build your queries while development.
Auto scaling doesn’t scale well
Performance of a DynamoDB table is decided by the provisioned read capacity units (RCU) and write capacity units (WCU). If you want a higher IOPS rate you have to increase these values at a higher cost. But having higher RCU and WCU values is not something practical in the long run with budget considerations. So solution provided by AWS for this concern is auto scaling. With auto scaling you can set the DynamoDB to vary provisioned read and write capacities dynamically depending on the load. So that you can initially set a lower provisioned capacity and scale it to a higher value with auto scaling in order to cater peak loads.
But the problem is DynamoDB autoscaling behavior is somewhat problematic. Even though you set scale up and scale down alarms it won’t scale immediately to cater peak/burst loads. According to our investigations, it will take nearly 15 minutes to scale up. To be more clear, suppose you have set the provisioned read capacity to 25 and you start a load test on your application which reads data from a DynamoDB table. Let’s assume the concurrency of the load test is 50 and the load is continuous. You will definitely notice that the requests will start to timeout. Then if you check the current provisioned capacity of the DynamoDB table in AWS console, you will see that it’s still 25 and auto scaling has not been triggered yet. This behavior has been explained in detail here.
This is not actually a bug with DynamoDB auto scaling but the expected behavior according to the AWS documentation.
DynamoDB Auto Scaling is designed to accommodate request rates that vary in a somewhat predictable, generally periodic fashion. If you need to accommodate unpredictable bursts of read activity, you should use Auto Scaling in combination with DAX (read Amazon DynamoDB Accelerator (DAX) — In-Memory Caching for Read-Intensive Workloads to learn more).
So, if your application needs to respond for burst loads you have to configure DAX for your DynamoDB tables as recommended. But there are some limitations associated with DAX also. You can find a complete list here under usage notes. But there is one annoying limitation which is worth mentioning here. That is, a VPC must be assigned for the DAX cluster and the DAX can only be accessed from an EC2 instance running inside the same VPC as the DAX cluster. That means you cannot access DAX from your development machine even though your VPC has public internet connectivity. So, that will make the development tasks almost impossible with DAX. And DAX is not included in the AWS java SDK out of the box. Therefore, you have to install it as a separately downloaded jar since it is not available as a maven dependency either.
I think you had a good understanding about the limitations of DynamoDB. But that doesn’t mean you should avoid using DynamoDB. Thing is you have to think carefully and design ahead if you are planning to use DynamoDB as the database for your next application.