AWS Database Blog

Accelerate migrations to Amazon DocumentDB using AWS DMS

Amazon DocumentDB (with MongoDB compatibility) is a fully managed native JSON document database that makes it easy and cost effective to operate critical document workloads at virtually any scale without managing infrastructure. Amazon DocumentDB simplifies your architecture by providing built-in security best practices, continuous backups, and native integrations with other AWS services.

AWS Database Migration Service (AWS DMS) is a managed migration and replication service that helps move your database and analytics workloads to AWS quickly, securely, and with minimal downtime. AWS DMS supports migration between 20-plus database and analytics engines including Amazon DocumentDB.

As customers migrate an increasing number of workloads to DocumentDB using DMS, they are asking for the ability to replicate data continuously from a source with high rate of ongoing traffic. With the launch of the Change Data Capture (CDC) Parallel Apply feature in DMS 3.5.1, you can increase the CDC throughput by adding parallel threads to the DMS task. The throughput of continuous data migration will be determined by the number of threads requested.

In this post we discuss how to apply parallel threads to DMS task to increase CDC throughput. In a previous post we discussed how to improve the performance of migrating existing data using parallel full load.

Prerequisites

To follow along with this post, you should have a basic understanding of how AWS DMS works. If you’re just getting started with AWS DMS, review the AWS DMS documentation.

Configure CDC Parallel Apply settings to DMS Task

AWS DMS supports the following task settings to increase CDC throughput:

  • ParallelApplyThreads – Specifies the number of concurrent threads that AWS DMS uses during a CDC load to apply changes to the Amazon DocumentDB target endpoint. The default value is zero and the maximum value is 32. This is the main parameter effecting CDC apply throughput. Start with a value that is the same as the number of vCPUs in the replication instance and tune it to achieve your desired throughput.
  • ParallelApplyBufferSize – Specifies the maximum number of records to store in each buffer queue for the parallel apply threads to push to an Amazon DocumentDB target endpoint during CDC. The default value is 100 and the maximum value is 1,000. Use this option when ParallelApplyThreads specifies more than one thread. The default value is good for most workloads.
  • ParallelApplyQueuesPerThread – Specifies the number of queues that each thread accesses to take data records out of queues and generate a batch load for an Amazon DocumentDB endpoint during CDC. The default is 1 and the maximum is 512. The default value is good for most workloads.

On the AWS DMS task creation page, in the Task settings section, select JSON editor and update the following settings

  • Set ParallelApplyThreads to 8
  • Set ParallelApplyBufferSize to 100
  • Set ParallelApplyQueuesPerThread to 1

dbblog-3468-task-settings-new

Sample performance improvement test results

We used the following setup to test our performance improvement:

  • AWS DMS replication instance : dms.c5.2xlarge
  • AWS DMS version: 3.5.1
  • Target Amazon DocumentDB cluster instance : db.r6g.2xlarge
  • Target Amazon DocumentDB version: 5.0

Note that increasing parallel CDC apply threads increases the resource utilization of your replication instance and target cluster because you’re applying changes more aggressively. Make sure to size the instances according to your CDC throughput needs.

The following figure shows the number of CDC operations applied by AWS DMS per second (y-axis) while doing an insert only workload using different ParallelApplyThreads values (x-axis).

insert only workload

The following figure shows the number of CDC operations applied by AWS DMS per second (y-axis) while doing an insert and update mixed workload, using different ParallelApplyThreads values (x-axis).

insert and update mixed workload

Conclusion

Applying parallel threads can improve the throughput of continuous data migration tasks significantly. Test with your workload in lower environments under production conditions. All parameters such as workload size, rate of change, target instance and replication instance size will play a role in the overall performance improvement that you can achieve using ParallelApply* settings.

You can achieve performant low downtime online migrations to Amazon DocumentDB using AWS DMS by using both ParallelApply* settings for CDC and segmentation for full load.

For more information about these features and AWS DMS, see the AWS DMS documentation. We also recommend reviewing Segmenting MongoDB collections and migrating in parallel and Segmenting Amazon DocumentDB collections and migrating in parallel.


About the author

Sourav Biswas is a Senior DocumentDB Specialist Solutions Architect at Amazon Web Services (AWS). He has been helping AWS DocumentDB customers successfully adopt the service and implement best practices around it. Before joining AWS, he worked extensively as an application developer and solutions architect for various noSQL vendors