AWS Database Blog
Migrate document workloads from Oracle SODA to Amazon DocumentDB
In this post, we show you a modernization path for the migration of your JSON workloads from on-premises databases to the AWS Cloud. You can move your document workloads to Amazon DocumentDB (with MongoDB compatibility), and use full capabilities of this purpose-built JSON database.
Amazon DocumentDB is a fully-managed native JSON document database that makes it simple and cost effective to operate critical document workloads at virtually any scale without managing infrastructure. Amazon DocumentDB simplifies your architecture by providing built-in security best practices, continuous backups, and native integrations with other AWS services.
When you migrate from commercial relational databases such as Oracle or SQL Server, you can choose another relational database as the migration target. This is a common path for migrations to the cloud. Many relational databases support the JSON data type. This makes it possible to store relational data and JSON documents inside the same database. However, in the long term this solution might have downsides such as performance degradation, increased costs, and additional complexity. To address these downsides, you can use a purpose-built JSON database such as Amazon DocumentDB.
Oracle SODA and Amazon DocumentDB
For this example, we use a modernization migration from Oracle Simple Object Data Access (SODA) to Amazon DocumentDB. Oracle SODA is a set of APIs that developers use to manage collections of documents in Oracle Database. We discuss similarities and show query examples of Oracle SODA and Amazon DocumentDB. We also show how you can migrate data from Oracle SODA to Amazon DocumentDB using AWS Database Migration Service (AWS DMS). Finally, we show how you can implement extract, transform, and load (ETL) processes using AWS Glue.
Let’s start by comparing Oracle SODA and Amazon DocumentDB query syntax. Many Oracle SODA queries can be run in Amazon DocumentDB without modifications. It may come as a surprise that engineers who are familiar with writing Oracle SODA queries are already familiar with writing Amazon DocumentDB queries.
To demonstrate the capabilities of Oracle SODA and Amazon DocumentDB, we create a collection of cars documents. We insert a sample JSON document to show its structure and fields, which we later use to build sample queries.
Let’s start with creating an Oracle SODA collection and inserting a sample JSON document.
- Connect to Oracle database
- Run the following stored procedure to create a SODA collection and insert a sample car JSON document:
Amazon DocumentDB is a NoSQL database, and you don’t need to create a collection before inserting a document into it. If the collection doesn’t exist, Amazon DocumentDB creates it. You can write the Oracle stored procedure from the previous example as a single Amazon DocumentDB command:
Oracle SODA and Amazon DocumentDB queries
The sample JSON document that we inserted contains the following fields: make
, model
, owner
, array of string features
, and array of parameters
objects (with name
and value
fields).
Oracle SODA and Amazon DocumentDB query syntax share a lot in common. You can run many Oracle SODA queries in Amazon DocumentDB without modifications. Let’s see a few examples.
Let’s start with a basic Oracle SODA Query By Example (QBE). To keep sample codes short instead of iterating over the documents, we will only count them. In the first example, we want to count all cars that meet the following criteria:
- The owner is
John Smith
- The make is
SportForYou
- It has the parameter
DriveType
with a value of4WD
- It has the feature
GPS
In Oracle we use the following code:
When migrating to Amazon DocumentDB, you can copy the filter directly from the preceding Oracle SODA query and pass it as a JSON object to the find
function in Amazon DocumentDB:
The query syntax is exactly the same: query structure, JSON paths, $in
operation. There are many more such examples: logical expressions like $and
, $or
, $nor
, and $not
; arithmetical expressions like $lt
, $gt
, and $between
; and more.
If we want to count how many cars TruckForYou Heavy H1
or SportForYou Bolid 1
are in the Oracle SODA collection, we would use the following code:
The preceding query is exactly the same in Amazon DocumentDB:
Although most operations are similar, there are some differences. For example, if we want to get all of the cars whose make matches Sport
in Oracle SODA, we would use $like
as in the following snippet:
In Amazon DocumentDB, $like
doesn’t exist; instead the preceding query uses $regex
:
To learn more about Oracle SODA filter specification, see Overview of SODA Filter Specifications (QBEs). To learn more about Amazon DocumentDB operators, see Query and Projection Operators.
Oracle SODA and Amazon DocumentDB pagination support
To complement query examples, let’s wrap up with a pagination support example. This is a very common feature used on many websites. Oracle SODA and Amazon DocumentDB support filter, sort, skip, and limit operations.
In Oracle SODA, the JSON query changes its syntax and becomes a nested JSON document with $query
and $orderby
fields, as shown in the following code:
In Amazon DocumentDB, the find
function stays the same and the sort parameters are passed to the sort
function:
Migrate data from Oracle SODA to Amazon DocumentDB
To move document workloads from Oracle to Amazon DocumentDB you can use AWS DMS. AWS DMS is a managed migration and replication service that helps move your database and analytics workloads to AWS quickly, securely, with minimal downtime, and zero data loss. AWS DMS supports migration between many database and analytics engines.
In the following example we use AWS DMS to copy data from our Oracle SODA cars collection to Amazon DocumentDB. When you create a replication task in AWS DMS you need to define a set of mapping rules. These rules instruct AWS DMS how the data should be replicated.
An Oracle SODA collection is implemented as a table and we need to transform two of its columns before they can be replicated to Amazon DocumentDB. The columns and their mapping rules are the following:
- id varchar – This needs to be renamed to
_id
in Amazon DocumentDB using the AWS DMSrename
column transformation - json_document clob – This needs to be changed to
string
data type using the AWS DMSchange-data-type
column transformation
The following code example shows the complete AWS DMS mapping rules for the cars collection:
You don’t need to make additional configurations; AWS DMS automatically converts the columns to JSON fields. Furthermore, columns that start with json_
are coerced as a nested JSON document. AWS DMS puts this coerced JSON document inside the field whose name is the original column name, but without the json_
prefix.
To illustrate this behavior, see the following document, which AWS DMS migrated to Amazon DocumentDB. The source Oracle SODA collection had id
and json_document
columns. The id
column was migrated to _id
and the json_document
column was coerced as a nested document
field:
To learn more about how you can use AWS DMS to replicate data to Amazon DocumentDB, see: Using Amazon DocumentDB as a target for AWS Database Migration Service.
Implement ETL in Amazon DocumentDB
You can use AWS Glue as a fully-managed Apache Spark service to implement complex ETL processes for Amazon DocumentDB.
You can use the write_dynamic_frame
operation to write DynamicFrame
to Amazon DocumentDB just like other supported connection types:
For more information, see Examples: Setting connection types and options.
In your ETL process, you will want to customize the conversion of a flat comma-separated values (CSV) file into a nested JSON document. Using the cars example, but with more fields, let’s say we have the following flat file CSV:
We want to convert it into the following JSON document:
You can use the following snippet in AWS Glue Studio as a custom transformation step. You can use it as a starting point for creating your own nested JSON documents in AWS Glue. Note that the AWS Glue DynamicFrame
doesn’t require developers to create schemas. AWS Glue will infer the schema of the newly created DynamicFrame
automatically.
Benefits of migrating to Amazon DocumentDB
In this section, we discuss the three key benefits of modernizing JSON workloads and migrating them to Amazon DocumentDB.
The first benefit is performance and features completeness. Amazon DocumentDB can outperform relational databases that use the JSON data type by orders of magnitude. In a recent migration project, our team compared the PostgreSQL JSON data type and Amazon DocumentDB. Amazon DocumentDB was able to meet tight SLA requirements even with 14 times more data compared to PostgreSQL. In fact, the database performance is the main reason why purpose-built databases (including graph, time series, and key-value) were built and are so popular and successful. Additionally, as a JSON-native database, Amazon DocumentDB, offers many more JSON features than relational databases do. A simple example would be indexing JSON array fields or running powerful JSON documents aggregation pipelines.
The second benefit of moving to Amazon DocumentDB is the fact that you may already be familiar with Amazon DocumentDB query syntax. In our migration project, we migrated from Oracle SODA and found that things like JSON paths, query structure, array expressions, logical expressions, text operations, or arithmetic expressions were similar. Many queries can run in Amazon DocumentDB without needing changes. This means that you may already be familiar with Amazon DocumentDB without even knowing it. This also means that database engineers with experience in Oracle SODA can quickly get started with Amazon DocumentDB.
The third benefit of moving to Amazon DocumentDB is that you get access to the existing MongoDB ecosystem. There are numerous frameworks, ETL engines, and drivers for virtually every programming language which you can use with Amazon DocumentDB. Amazon DocumentDB supports MongoDB 3.6, 4.0, and 5.0 wire protocols. However, there are some differences between Amazon DocumentDB and MongoDB, for more see Functional Differences: Amazon DocumentDB and MongoDB and Supported MongoDB APIs, Operations, and Data Types. As we showed in this post, you can use the Apache Spark framework to implement your ETL processes and use MongoDB sink to write data to Amazon DocumentDB. If you want to use Apache Spark, then you can use the AWS Glue managed Spark service. These are just a few examples.
Conclusion
In this post, we discussed how you can modernize JSON workloads and migrate them to Amazon DocumentDB. Modernization to Amazon DocumentDB can be simpler than you think, and there are many features that you, your team, and your project can benefit from.
You can follow the Get Started with Amazon DocumentDB guide to get you started with Amazon DocumentDB in less than 15 minutes. Once you have Amazon DocumentDB provisioned, you can run the Amazon DocumentDB queries that we used in this post and see them in practice.
Finally, for more information on Amazon DocumentDB, visit the product page.
About the Author
Lukasz Budnik is a Principal Software Dev Engineer working in the Amazon DMA team. Lukasz works with the biggest Amazon customers on the most challenging cloud migration and modernization programs. In his free time, Lukasz enjoys learning new human languages. ¡Hola amigos!