AWS Database Blog

Implement active-active replication with RDS Custom for Oracle: Part 2 – High Availability & Disaster Recovery

In this post, we advance the architecture that was discussed in Implement active-active replication with RDS Custom for Oracle: Part 1 – High Availability, where we implemented an Amazon RDS Custom for Oracle solution with multi-master and high availability. This post will show you how to add high availability (HA) and disaster recovery (DR) using Oracle Data Guard implemented on RDS Custom for Oracle.

Oracle GoldenGate (GoldenGate) is implemented using Amazon Elastic File System (Amazon EFS), which provides HA. We also use Fast-Start Failover (FSFO) and an observer, which records information about the failover target, how long to wait after a failure before triggering a failover, and other FSFO-specific properties.

Solution overview

The Figure-1 shows the implementation of two primary RDS Custom for Oracle instances in the Availability Zone (AZ) 1A across two different Regions. Both instances also have an active standby running in the AZ 1B Availability Zone in both Regions. There is an EFS mount point in each Region, which is mounted on both the primary and standby RDS Custom instance in each Region. GoldenGate binaries are installed on all the RDS Custom instances (primary and standby); the shared GoldenGate files are created on Amazon EFS to provide high availability.

We also have GoldenGate bi-directional replication set up between the primary RDS Custom instance in both Regions. We use an Amazon Simple Storage Service (Amazon S3) bucket as a landing area and object file storage, and the bucket is shared across the primary and standby databases within a Region. The observer is running on a separate Amazon Elastic Compute Cloud (Amazon EC2) instance in a different Availability Zone (AZ 1C and AZ 2C). The automated backup feature of Amazon RDS for Oracle backs up your databases and archives redo logs securely in Amazon S3 for a user-specified retention period.

Figure 1 – Multi Master using Oracle Goldengate

Figure 1 – Multi Master using Oracle Goldengate

This solution offers the following benefits and features:

  • Write scaling with an active-active setup, using GoldenGate bi-directional replication.
  • High availability because of an event causing a workload to run in a degraded state and running out of AZ could be a mitigation
  • Disaster recovery even because of an event causing a workload to run in a degraded state and running out of another region could be a mitigation.
  • FSFO and the observer provide automatic switchover in the event of failure.
  • We use Amazon EFS to provide the HA functionality to GoldenGate.
  • GoldenGate on Amazon EFS allows us to restart GoldenGate after switchover with zero data loss.
  • Read scalability by using Active Data Guard.
  • You can create up to five managed Oracle replicas of your RDS Custom for Oracle primary DB instance. You can also create your own manually configured (external) Oracle replicas. External replicas don’t count toward your instance limit. Note that RDS Custom for Oracle detects instance role changes upon manual failover (such as FSFO) for managed Oracle replicas, but not for external Oracle replicas.
  • You can further optimize the application by using Elastic Load Balancing, Application Load Balancer, or Amazon Route 53 depending on the acceptable Recovery Point Objective (RPO) and Recovery Time Objective (RTO).

This architecture contains the following components:

  • Application Load Balancer – Comes with HA in two Availability Zones
  • Two primary RDS Custom for Oracle instances – An application tier on EC2 servers in two Availability Zones
  • Two standby RDS Custom for Oracle instances – Uses an RDS Custom for Oracle database with a Data Guard read replica in separate Availability Zones
  • EC2 instance – One EC2 instance for running an observer. You can also add another EC2 instance in a different AZ to provide HA for the EC2 instance running the observer as well.
  • Amazon EFS – A shared file system between the primary and standby RDS Custom instances
  • GoldenGate – Used for bi-directional replication
  • Amazon S3 – Object storage for backup and files

Deploy the solution

To deploy the solution architecture, complete the following steps:

Steps to create the First Region architecture

In this section we outline the steps required to create Primary and Standby RDS Custom for Oracle database , along with FSFO,EFS and Goldengate install in the first region.

  1. Create the first RDS Custom for Oracle primary database in one Availability Zone of the Region chosen for the deployment of the active-active architecture.
  2. Create an RDS Custom for Oracle standby database in another Availability Zone of the same Region chosen in Step 1 and set up synchronous or asynchronous replication with the primary instance.
  3. Create an EC2 instance in another Availability Zone of the Region chosen in Step1.
  4. Implement FSFO and an observer on the EC2 instance created in Step 3. For more information, refer to Build high availability for Amazon RDS Custom for Oracle using read replicas and the whitepaper Enabling High Availability with Data Guard on Amazon RDS Custom for Oracle.
  5. Create an EFS file system in the same Region as Step 1. For details, refer to Integrate Amazon RDS Custom for Oracle with Amazon EFS.
  6. Install GoldenGate binary files on both the primary and standby RDS Custom instances created in Step 1 and Step 2.
  7. Create the GoldenGate shared directories on the EFS file system (refer to the steps in Part 1).

Steps to create the second Region architecture

In this section we outline the steps required to create Primary and Standby RDS Custom for Oracle database , along with FSFO,EFS and Goldengate install in the second region.

  1. Create the second RDS Custom for Oracle primary database in one Availability Zone of the second Region chosen for the deployment of the active-active architecture.
  2. Create an RDS Custom for Oracle standby database in another Availability Zone of the same Region chosen in Step 1 and set up synchronous or asynchronous replication with the primary instance created in Step 1.
  3. Create an EC2 instance in another Availability Zone of the Region chosen in Step 1.
  4. Implement FSFO and an observer on the EC2 instance created in Step 3.
  5. Create the second EFS file system in the same Region as Step 1.
  6. Install GoldenGate binary files on both the primary and standby RDS Custom instances created in Step 1 and Step 2.
  7. Create the GoldenGate shared directories on the EFS file system.

Steps required on both the Regions.

In this section we configure the Goldengate bi-directional replication and application connectivity.

  1. Configure the GoldenGate bi-directional replication between the RDS custom primary instances created in previous sections. For detailed steps, refer to Part 1.
  2. Configure the application to connect to either of the primary RDS Custom instances.
  3. The read-only workload can also connect to either of the RDS Custom standby databases if they’re configured as an active standby database.

Failover scenario

For this use case, we assume that the 1A Availability Zone has degraded performance. The sequence of events is as follows:

  • The observer detects there is a failure, which in turn initiates the failover.
  • FSFO switches the database roles. The standby RDS Custom database in AZ 1B becomes the primary, and the primary in AZ 1A becomes the standby (after the failure is resolved).
  • Manually Restart GoldenGate on AZ 1B (the new primary), and it applies all the unapplied trail files from the EFS file share.
  • The observer starts monitoring the new primary.
  • Applications can start connecting to the new primary in AZ 1B for read/write loads.
  • Applications can start connecting to the new standby in AZ 1A for read-only loads.

The Figure-2 shows the architecture after the failover. In this architecture, the RDS Custom for Oracle primary instance and the standby instance have switched roles.

Figure 2 – After Failover - Multi Master using Oracle Goldengate

Figure 2 – After Failover – Multi Master using Oracle Goldengate

Recommendations

Note the following recommendations when implementing this solution:

  • For zero RPO, Data Guard replication between the primary and standby should be in synchronous mode.
  • To reduce the RTO and achieve transparent application failover, we recommend using services like Route 53, Oracle Connection Manager, or third-party solutions like Heimdall Proxy.
  • RDS Custom for Oracle detects instance role changes upon manual failover (such as FSFO) for managed Oracle replicas, but not for external Oracle replicas.
  • For FSFO environments, set db_flashback_retention_target = 60 or higher to provide sufficient Flashback Database history for automatic standby reinstatement.
  • Flashback Database stores its logs in the Flash Recovery Area (FRA), so the FRA must be large enough to store at least 60 minutes of Flashback Database history.
  • Configure a VPN tunnel to encrypt the communication. (Only perform this task if you want the traffic among the Data Guard instances to be encrypted.)
  • The NetTimeout property specifies the number of seconds Oracle log writer (LGWR) will block waiting for acknowledgment from the standby in synchronous mode before considering the connection lost (corresponds to the NET_TIMEOUT option of log_archive_dest_n). The default value is 30 seconds. When using Maximum Availability mode, consider lowering this to reduce the time commits block when the standby becomes unavailable. Choose a value high enough to avoid false disconnects from intermittent network trouble.
  • Set the FastStartFailoverThreshold property to specify the number of seconds you want the observer and target standby database to wait (after detecting the primary database is unavailable) before initiating a failover.The default value for the FastStartFailoverThreshold property is 30 seconds, and the lowest possible value is 6 seconds.

Use cases

With any high availability and disaster recovery solution, it’s important to make sure that the data loss is minimal and the recovery time is as fast as possible. Based on your business needs and HA/DR requirements, you could use this solution for the following use cases:

  • Your application requires write scaling with active-active writes, or you want to segregate the write and consistent read loads based on the application while still having access to all the data
  • You have strict HA/DR requirements with zero RPO and an RTO of a few minutes
  • You want to maintain a DR solution even when one Availability Zone fails
  • You want to have an HA solution even when there is a Region failure
  • You want to implement a highly scalable solution with bi-directional replication and conflict resolutions

Clean up

To avoid future charges and to remove the components created while testing this use case, complete the following steps. Repeat the below steps in all the regions where the databases were created.

  1. On the Amazon RDS console, select the database you set up, and on the Actions menu, choose Delete.
  2. On the Amazon EC2 console, select the EC2 instance that you used, and on the Actions menu, choose Terminate.
  3. On the Amazon EFS console, select the EFS filesystem that you created , and choose Delete.
  4. On the Amazon S3 Console, select the S3 bucket that you created, and choose delete.

Conclusion

In this post, we discussed the architecture and how we can leverage RDS Custom for Oracle and GoldenGate which provides high availability and disaster recovery. Depending on your HA/DR requirements, you can use this architecture in whole or in part to achieve your desired outcome.

If you have any further questions, Please leave a comment.


About the Authors

Vineet Agarwal is a Senior Database Specialist Solutions Architect with Amazon Web Services (AWS). Prior to AWS, Vineet has worked for large enterprises in financial, retail and healthcare verticals helping them with database and solutions architecture. In his spare time, you’ll find him playing poker, trying a new activity or a DIY project.

Vishal Srivastava is a Senior Partner Solutions Architect specializing in databases at AWS. In his role, Vishal works with ISV Partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.

Maharshi Desai is a Worldwide GTM Specialist for RDS with Amazon Web Services. He defines strategies, plans and execute it to drive the growth for RDS business. He is also leading the Healthcare and Lifesciences domain to drive RDS strategies to align with Industry use cases.

Wasim Shaikh is a Senior Partner Solutions Architect specializing in databases at AWS. He works with customers to provide guidance and technical assistance about various database and analytical projects, helping them improving the value of their solutions when using AWS.