AWS Database Blog

Accelerate large database ­­­migrations to Amazon RDS Custom Oracle ­­­using Tsunami UDP

One of the steps in migrating a database to the cloud involves transferring existing data from the source (on premises) to the target (cloud). For medium and large databases, such as those ranging from hundreds of gigabytes to 5 terabytes, speed of data transfer matters. One of the challenges is to minimize the downtime of the application during migration. When migrating databases supporting critical business processes, the impact of delayed data transfer can be significant.

AWS provides a variety of data transfer mechanisms, such as AWS Snow Family, AWS Storage Gateway, and AWS DataSync, to aid in cloud migration. As an alternative, you can achieve efficient transfer of your data during migration using Tsunami UDP, an open-source and free software that is simple to install and use. Tsunami UDP achieves high-speed file transfer by combining UDP and TCP protocols, making it optimal for moving data across long-distance networks. Tsunami UDP achieves more throughput than most TCP based tools like SCP and SFTP by using UDP to transfer data at high speeds and TCP to maintain communication.

In this post, we show you how to use Tsunami UDP to speed up the migration of a large Oracle database to Amazon RDS Custom for Oracle. You can apply the steps in this post to a managed service such as Amazon RDS Custom for Oracle or a self-managed Oracle database on Amazon Elastic Compute Cloud (Amazon EC2).

Solution walkthrough

The following diagram illustrates the solution architecture.

This solution installs Tsunami UDP on the source and destination database servers to enable data transfer. To simulate data transfer over a long-distance network, we use a source RDS Custom for Oracle database running in us-west-2 to represent our on-premises database, and an RDS Custom for Oracle database running in us-east-1 as the destination.

The high-level steps are as follows:

  1. Install Tsunami UDP on the source (on-premises) database instance and on the destination RDS Custom for Oracle DB instance.
  2. Create Oracle database backups on the source RDS Custom for Oracle instance.
  3. Transfer backup files from source to destination using Tsunami UDP.
  4. Restore the transferred backup file on the destination instance.

Prerequisites

For this walkthrough, the following prerequisites are necessary:

  • Background knowledge about Oracle backup and restore
  • An active AWS account
  • Secure network connectivity between source and destination, for example AWS Direct Connect or AWS VPN
  • A source Oracle database instance running on an on-premises environment or Amazon EC2, or an RDS Custom for Oracle instance. See AWS documentation for Setting up your environment for Amazon RDS Custom for Oracle.
  • Amazon RDS Custom for Oracle or Oracle on Amazon EC2 as the destination
  • Git command to download the Tsunami UDP source code, if not already installed. If using RDS Custom, this already comes preinstalled
  • Tsunami UDP software
  • Make command to compile Tsunami UDP from source code. If using RDS Custom, if not already installed

Install Tsunami UDP on the source and destination Oracle servers

Tsunami UDP binaries are not distributed; only the source code is available and should be compiled on the source and destination server. The binaries consume 10 MB of storage space. You don’t need root privileges to compile and run Tsunami UDP; a normal user account is sufficient. Make sure that the Tsunami user has permissions to read the file so it can be transferred.

As a best practice, install third-party software in new directories under the /rdsdbata mount point. Doing so ensures that your software is protected since /rdsdbata is included in RDS automated and manual snapshots. If installing TsunamiUDP as a non-root user, you need an admin user to create the folder and make your operating system user its owner.

Follow these steps to install Tsunami UDP – first on the server hosting your source Oracle database server, and then on the destination Amazon RDS Custom for Oracle.

Download the source code

Use the following Git command to download the source code:

$ sudo -i #change to root user

$ mkdir /rdsdbdata/tsunamiUDP #per best practice create directory under rdsbdata

$ chown rdsdb:rdsdb /rdsdbdata/tsunamiUDP #change ownership to rdsdb or your preferred user

$ ls -lrt /rdsdbdata/

$ sudo -su rdsdb #change to operating system user or your preferred user

$ cd /rdsdbdata/tsunamiUDP #change directory to tsunamiUDP to download TsunamiUDP

$ git clone https://git.code.sf.net/p/tsunami-udp/code_git tsunami-udp-code_git

Alternatively, download the source code directly from the Tsunami UDP repository source. You will see an output similar to the following code when you download Tsunami UDP.

$ ls -lrt /rdsdbdata/tsunamiUDP/tsunami-udp-code_git

$ ls -lrt /rdsdbdata/tsunamiUDP/tsunami-udp-code_git/tsunami-udp

Compile and install Tsunami UDP software

Once downloaded, install the Tsunami UDP software by issuing the make command in the directory holding the source You will receive an output similar to the following.

You can add the Tsunami UDP server and client to the path in the profile of your operating system user to avoid having to specify the whole path when executing the programs to transfer files.

Do not forget that the above steps need to be followed on both source and destination servers.

Configure network connectivity between source and destination

On both servers, open TCP ports 22 and 46224, and UDP port 46224 to allow seamless flow of Tsunami UDP traffic.

In this example, we open the required TCP and UDP ports by configuring security group rules at the source and destination. Work with your network administrator to configure the required rules if you are following this walkthrough using an on-premises source.

Create an Oracle database backup on the source

We used the following Oracle Data Pump command to unload data and metadata from our sample database into an operating system dump file:

expdp directory=datapump dumpfile=source.dmp logfile=source.imp full=y

The following screenshot shows part of the output of the preceding command, with a successful Data Pump export.

In this example, the size of the export dump backup file is 463 GB.

Transfer the files using Tsunami UDP

Tsunami UDP uses a two-step process to transfer files. The first step is to start the Tsunami server on the source server using the tsunamid command. The second step is to start the Tsunami client on the destination using the tsunami command.

To transfer the files, complete the following steps:

  1. On the source server, identify the files to transfer and start the Tsunami server. In this example, we transfer the 463 GB source.dmp dump file.

Tsunami UDP allows you to transfer one or more files with a single command. For a detailed guide of Tsunami UDP’s command, refer to the following usage guide.

  1. Navigate to the path where the export dump files are located and start the Tsunami UDP server process. In our example, we transfer one backup file with the following command:
$ cd /rdsdbdata/datapump/ #change directory to location of the dump file

$ /rdsdbdata/tsunamiUDP/tsunami-udp-code_git/tsunami-udp/server/tsunamid source.dmp

The default behavior is for the client not to be asked for a password when connecting to the server. This can be overwritten by specifying the --secret option when starting the Tsunami UDP server. For our example, the command would be:

$ /rdsdbdata/tsunamiUDP/tsunami-udp-code_git/tsunami-udp/server/tsunamid --secret <your_complex_password> source.dmp

The output message lets you know Tsunami UDP is ready to send the data and is waiting for a client to connect to it.

  1. On the destination server, traverse to the location where files need to be received and start the Tsunami UDP client process:
$ cd /rdsdbdata/datapump/

$ /rdsdbdata/tsunamiUDP/tsunami-udp-code_git/tsunami-udp/client/tsunami

The following screenshot shows the output of the tsunami command.

  1. Next, establish a connection with the source IP or hostname, and receive the files using the following commands:
tsunami> connect <ip_address>
tsunami> get *

If the Tsunami UDP server was started with the --secret option, then you must specify the password before connecting to the server. In our example, the command would be:

tsunami> set passphrase <your_complex_password>
tsunami> connect <ip_address>
tsunami> get *

The following is a screenshot of the output displayed while the file is being transferred.

On completion, you will see transfer stats like the following.

While the files are being transferred, you will see the following output on the source server when the client establishes the connection.

You’ll see the following when the file transfer is initiated.

When the file transfer is complete, you’ll see the transfer details.

Restore the database

Restore the database using the Oracle Data Pump impdp command:

impdp directory=datapump dumpfile=source.dmp logfile=impsource.log full=y

Clean up

Make sure to remove any resources such as the EC2 instance or RDS Custom for Oracle database you created, should you no longer need them, to avoid costs.

Limitations

Tsunami UDP comes with the following limitations that need to be considered when choosing which mechanism to use to transfer files:

  • Tsunami UDP is single-threaded.
  • It doesn’t provide it’s own encryption. It should only be used on a private secure network, or on encrypted networks such as AWS Direct Connect or AWS Site-to-Site VPN, especially if you are transmitting sensitive data. For added protection, you can also encrypt the files to be transmitted using any encryption tool you are familiar with.
  • It’s designed to transfer large datasets sequentially. Transferring small datasets minimizes the throughput.

Summary

During our testing we transferred our 463 GB Oracle backup file in less than 97 minutes between two db.m5.xlarge RDS Custom for Oracle instances hosted in different Regions. This same file took 256 minutes to transmit using SCP. These timings can vary depending on your network.

This post demonstrated how Tsunami UDP provides a simple and efficient method to transmit backup files quickly when migrating large databases from on premises to the AWS Cloud or between different Regions.

AWS provides several powerful cloud storage solutions that address Tsunami UDP limitations and help businesses increase efficiency, and improve scalability. AWS Snow Family offers physical devices that can be used to transfer large amounts of data. AWS DataSync provides an automated way to transfer data between on-premises storage systems and AWS cloud storage. AWS Storage Gateway provides a virtual storage appliance that can be used to connect on-premises applications to AWS cloud storage.

If you have any comments or feedback, leave them in the comments section.


About the authors

Lanre Showunmi is a Sr. DB Specialty Architect with AWS Professional Services. He helps customers and partners to build robust solutions in AWS cloud.

Jose Amado-Blanco is a Sr. Consultant on Database Migration with over 25 years of experience working with AWS Professional Services. He helps customers on their journey to migrate and modernize their database solutions from on-premises to AWS.

Santhosh Kasarla is a Database Consultant who specializes in database migration and modernization. He has been supporting and enabling customers to build resilient, cost effective and optimized databases in AWS. Santhosh is passionate about learning new technologies and automation.