AWS Database Blog

High availability for Oracle GoldenGate Microservices Architecture in AWS

Oracle GoldenGate is a widely used replication software for low-downtime data migrations to AWS, real-time data ingestion to various analytical services, continuous data replication between AWS and on-premises systems, active-active replication, blue/green deployment, sharding of operational databases, and more. While designing an architecture for GoldenGate deployment to help meet these use cases, it’s important to consider high availability of the replication component to make sure that the solution meets your business SLA, such as acceptable data divergence.

GoldenGate supports Classic and Microservices Architectures. In Part 1 of this series, we covered high availability for GoldenGate classic architecture in AWS. In this post, we discuss a reference architecture for GoldenGate Microservices Architecture (MA) in AWS. The GoldenGate Hub is a widely used deployment model that simplifies operations and lessens the use of system resources on both the source and target systems. This is in contrast to the conventional method where GoldenGate components run directly on the source and target systems.

Solution overview

Because there are numerous use cases for GoldenGate with more than one deployment architecture, we focus on a high availability architecture for unidirectional replication that can be extended to other configurations as well. The following diagram illustrates this architecture.

In this reference architecture, we use GoldenGate MA Hub deployments for replication from an on-premises Oracle database to a target Oracle database running on Amazon Relational Database Service (Amazon RDS) for Oracle. A GoldenGate Hub instance in the source on-premises environment connects to the GoldenGate Hub systems in AWS through a Target-Initiated Distribution Path using an Amazon Route 53 A record, which automatically points to an Amazon Elastic Compute Cloud (Amazon EC2) instance acting as the primary GoldenGate Hub instance in AWS. The solution uses two Amazon EC2 instances running in active-passive mode for GoldenGate Hub with an Amazon Elastic File System (Amazon EFS) file system holding the deployment directories that need to be shared between the GoldenGate Hub instances.

We also discuss the automation of the failover process to update the Route 53 A record to point to the new primary GoldenGate Hub instance as well as starting the necessary GoldenGate services in the new GoldenGate Hub primary instance.

Prerequisites

This post assumes familiarity with GoldenGate MA and its various components along with configuration steps using the Admin Client CLI and console. In addition to familiarity with GoldenGate architecture and management tools, this post also assumes knowledge on various AWS services such as Amazon EC2, Amazon EFS, Amazon Route 53 and AWS Lambda, especially if you intend to automate the failover process.

Implementation configuration

Refer to Installing Oracle GoldenGate Microservices for detailed instructions for installation and configuration of GoldenGate MA for your specific needs. The following table summarizes the configuration used in this post. For the purpose of this post, the on-premises environment is simulated in AWS using a self-managed Oracle database and GoldenGate Hub instances running on EC2 instances.

Resource Name Value Remarks
GoldenGate MA version 21.3 Version for all three GoldenGate Hub instances.
On-premises GoldenGate Hub instance 10.0.134.98 IP of the on-premises GoldenGate Hub instance.
GoldenGate Hub instance on AZ1 10.0.131.86 IP of the EC2 GoldenGate Hub instance in AZ1.
GoldenGate Hub instance on AZ2 10.0.146.207 IP of the EC2 GoldenGate Hub instance in AZ2.
Target RDS instance connection URL joejosp-gg-target.xxxxx.us-east-1.rds.amazonaws.com Connection end point URL for target RDS for Oracle instance
Amazon EFS mount /efs Mount directory for EFS on the GoldenGate Hub instances.
GoldenGate installation directory /opt/oracle/gg_inst This directory is mounted on locally attached Amazon EBS storage.
Deployment directory: on-premises /opt/oracle/gg_deps/sm, /opt/oracle/gg_deps/dep1 For Service Manager and first deployment.
Deployment directory: AWS /efs/gg/sm, /efs/gg/dep1 For Service Manager and first deployment. This is on Amazon EFS for failover between EC2 GoldenGate Hub instances.
Route 53 A record to point to active EC2 GoldenGate Hub instance ggaws.joejosp.com This A record will point to the active GoldenGate Hub instance through the automation.
GoldenGate ports 3001, 3002, 3003 Admin, distribution, and receiver services, respectively.
Alias pointing to on-premises GoldenGate Hub instance ggonprem.joejosp.com For the purpose of this post, this is another A record in Route 53. For actual implementation, this could be the hostname or a DNS record in an on-premises DNS system.
Integrated Extract EXT1 This Extract process running on on-premises GoldenGate Hub instance captures changes from the on-premises Oracle database.
Target-initiated receiver path to_aws This target-initiated distribution path ships trail files from the on-premises database to AWS over an SSL connection.
Integrated Replicat REP1 This Replicat process running on target EC2 GoldenGate Hub instances replicates changes to the target RDS instance.

Configure a Route 53 A record

The Route 53 A record ggaws.joejosp.com points to the active Hub instance among the two EC2 GoldenGate Hub instances in AWS. Similarly, ggonrem.joejosp.com points to the on-premises GoldenGate Hub instances depending on your on-premises DNS mechanism. For this post, ggonrem.joejosp.com is also an A record in Route 53. The screenshot below shows the Route53 A records configured.

Later in the automated failover section, we discuss how this A record is updated to point to the new primary GoldenGate Hub instance in case of a failure.

Configure and mount Amazon EFS

Choose your Amazon EFS configuration to meet the performance requirements of your data replication and migration. It is recommended to choose Regional EFS, which stores data redundantly across multiple geographically separated Availability Zones within the same AWS Region. If your replication performance can’t be met by Amazon EFS, you may also consider Amazon FSx options such as Amazon FSx for OpenZFS or Amazon FSx for NetAPP ONTAP.

In this post, we use an EFS file system mounted on EC2 GoldenGate Hub instances to host the deployment directories for Service Manager and dep1.

We use the following mount command:

mount -t nfs4 -o nfsvers=4.1,rw,bg,hard,rsize=32768,wsize=32768,tcp,actimeo=0,noac,timeo=600,retrans=2,noresvport EFS_DNS:/ /efs

It is recommended that you mount your file system using its DNS name instead of the IP address as EFS automatically resolves DNS name to the IP address of the Amazon EFS mount target in the same Availability Zone as your Amazon EC2 instance without calling external resources.

Refer to Oracle GoldenGate Best Practice: NFS Mount options for use with GoldenGate (Doc ID 1232303.1) for more details on the recommended NFS mount options for GoldenGate.

/etc/fstab entry is configured to automatically mount Amazon EFS on the EC2 instances when they’re rebooted.

Install GoldenGate MA on premises and in AWS

For instructions to install GoldenGate MA on the on-premises GoldenGate Hub system and in EC2 instances, see Installing Oracle GoldenGate Microservices. Refer to the next section to configure SSL certificates for secure communication between two GoldenGate deployments.

In the on-premises system, the software is installed on the /opt/oracle/gg_inst directory and a deployment is created with the deployment directories /opt/oracle/gg_deps/sm and /opt/oracle/gg_deps/dep_1. The dep1 deployment is configured with SSL using server and client wallets for secure communication between the source and target GoldenGate Hub instances using distribution and receiver services.

For GoldenGate Hub instances in AWS, GoldenGate MA is installed on the /opt/oracle/gg_inst directory on Amazon Elastic Block Store (Amazon EBS) storage on both EC2 instances individually, then you create the deployment for Service Manager and dep1 only from one of the EC2 instances with the deployment directory on /efs, which will be mounted on both instances. The following diagram shows the directory structure used by the installation in AWS.

Authentication and secure communication between GoldenGate Hub instances using SSL

Follow the instructions in Connecting Two Deployments Using a Common RootCA Certificate to create a server certificate tagged with ggonprem.joejosp.com (for the on-premises GoldenGate Hub instance), target certificate tagged with ggaws.joejosp.com (for EC2 GoldenGate Hub instances), and client certificates for secured communication between the distribution and receiver services. You create the following wallets:

  • /opt/oracle/server with commonName as joejosp.com to be used as the server wallet for the on-premises GoldenGate Hub deployment
  • /opt/oracle/target with commonName as joejosp.com to be used as the server wallet for the AWS GoldenGate Hub deployment
  • /opt/oracle/client for the client certificate for both cases

You can build these wallets in GoldenGate Hub instances and copy them to other systems as needed. For this post, we use self-signed certificates. In a production environment, certificates may be provided by a digital certificate authority such as DigiCert.

The following screenshot shows the details for configurating a wallet on the on-premises GoldenGate Hub instance.

The following screenshot shows the details for configurating a wallet on the EC2 GoldenGate Hub instance.

Prepare the source and target databases

Refer to Preparing the Database for Oracle GoldenGate for instructions on preparing the source on-premises Oracle database to support GoldenGate replication, and refer to Setting up a source database for use with Oracle GoldenGate on Amazon RDS to configure the target RDS for Oracle instance to support GoldenGate replication.

Both the source and target databases are configured to support GoldenGate replication with archive log mode and supplemental logging enabled, and the ggs_admin user is created for the Extract and Replicat processes to connect to the source and target instances with the necessary privileges. The table GGS_ADMIN.TEST exists in the source and target databases, which you use for testing the replication.

Create and run the Extract process

Create an Integrated Extract EXT1 from Admin Service of the dep1 deployment in the on-premises GoldenGate Hub instance to capture changes from the source Oracle database instance with the credential alias gg_source_db pointing to the ggs_admin schema in the source database.

Create credential alias gg_source_db pointing to the source database

Create integrated extract

The following screenshots show the Extract EXT1 properties after it is created.


Create and run the distribution path

You can create the distribution path for fetching trail files from the on-premises GoldenGate Hub instance to AWS GoldenGate Hub instances from the source. However, in scenarios where the distribution server can’t initiate connections to the receiver server, you can create a target-initiated distribution path that establishes the connection to distribution services and pulls the requested trail files. In this example, we create a target-initiated distribution path, as shown in the following screenshots.

Create and run the Replicat process

In this step, we create and start an Integrated Replicat process to apply the changes from the trail files of the distribution path to the target RDS for Oracle instance using the credential alias gg_target_db, as shown in the following screenshots.

Create credential gg_target_db pointing to the target RDS instance

Create integrated replicat rep1

The following screenshot shows the properties of the replicat after it is created.

Test failure scenarios

Now that end-to-end replication topology is constructed, we can test various failure scenarios.

In a GoldenGate Hub deployment model in AWS, the failover decision can be based on the health of various components involved in the architecture, such as:

  • Health of the EC2 instances
  • Health of the Route 53 A record
  • Health of the EBS and EFS file systems
  • Health of GoldenGate services
  • Health of GoldenGate processes such as Extract, Replicat, and distribution path
  • Network timeout, which causes a broken replication flow though all services and processes shown as healthy

GoldenGate MA provides various REST APIs that you can use to health-check its processes and services. Using those REST API calls to make failover decisions would be a better approach than depending on the health of the EC2 instance itself. You may find scenarios where EC2 instances are healthy, but some GoldenGate processes are not functional. We discuss those examples later in this post.

In this section, we cover two common failure scenarios for reference.

Failover of the target RDS instance

Although this test is not directly related to the availability of the GoldenGate Hub instances, it verifies that the Replicat process can automatically resume the operation by reconnecting to the new primary instance after the failover operation.

  1. Using adminclient configure the profile for the Replicat with restart properties, which allows it to reestablish the connectivity to the target RDS instance as soon as the failover is complete:
    ./adminclient
    connect https://ggaws.joejosp.com:3000 deployment dep1 as admin password <pwd> !
    OGG (https://ggaws.joejosp.com:3000 dep1) 5> ADD PROFILE critical AUTOSTART AUTORESTART RETRIES 7 WAITSECONDS 30 RESETSECONDS 0 DISABLEONFAILURE NO
    stop rep1
    alter replicat rep1 profile critical
    start rep1
  2. Reboot the RDS for Oracle instance with failover.

It will take less than 2 minutes for the RDS failover to complete.

  1. Verify the Replicat process to see it automatically started after abending.

  1. Insert a record at the source on-premises database and verify that it is successfully replicated to the target RDS instance.

In this test, we observed how GoldenGate replicat process can automatically resume operation during the failover of the target RDS for Oracle instance without manual intervention.

Failure of the current active EC2 GoldenGate Hub instance

In this test, we stop the EC2 instance that is playing the primary role of the GoldenGate Hub instance in AWS. To resume the replication, we complete the following steps manually on the second EC2 instance to make it active:

  1. Stop the primary GoldenGate Hub EC2 instance.

  1. Connect to the secondary EC2 instance and verify that Amazon EFS is mounted on the GoldenGate Hub EC2 instance in AZ2.

  1. Navigate to the Amazon Route 53 console and update the Route 53 A record to point to the new primary GoldenGate Hub EC2 instance.

  1. Start the Service Manager, which will auto-start the deployment and the GoldenGate processes such as the replicat and the distribution path.

  1. Verify the health of the distribution path to_aws.

  1. Verify end-to-end replication by inserting a test record at the source Oracle database and checking its availability in the target RDS instance.

In this test, we observed how the GoldenGate deployment can resume operation in case of a failover of the GoldenGate Hub EC2 instance.

Automate failovers

The automation of the failover actions heavily depends on the deployment architecture, configuration options, and environment specific factors, such as the following:

  • If the GoldenGate MA hosts multiple deployments, should the failover include all deployments or only the specific deployment impacted?
  • When the target RDS instance fails over, should that invoke the failover of the GoldenGate Hub instance as well to keep both Hub and RDS instances in the same Availability Zone?
  • When the replication lag goes above the acceptable threshold (probably due to the GoldenGate Hub instance and RDS instance running in different Availability Zones), should a failover occur?

Instead of providing a ready-to-deploy automated failover solution, this post provides guidance to build an automation based on your specific replication needs and GoldenGate MA architecture. Because this post covers the high availability of the receiver and replicat processes in AWS, the example commands shown in this section are for those processes and related architecture components.

You can automate those commands using one of the following approaches to build an end-to-end automated failover solution for your specific configuration.

As your first option, you can schedule an AWS Lambda function to run the health-check of various GoldenGate processes and services. If a failure is detected, a failover action is invoked to bring up the services on the surviving EC2 instance. The Lambda function can use GoldenGate REST API calls and use AWS Systems Manager documents to trigger specific commands and scripts on the EC2 GoldenGate Hub instance as part of the health-check and failover. You can also use Amazon EventBridge for scheduling this Lambda function. The following diagram illustrates this architecture.

As an alternate option, you can schedule shell scripts to run on EC2 GoldenGate Hub instances as a cron job that does the health-check and automated failover actions. To avoid a split-brain scenario where both EC2 Hub instances assume the primary role simultaneously, you can use a heartbeat file on the shared EFS file system. For example, the shell scripts running on EC2 instances writing the latest timestamp in the heartbeat file and the EC2 instance which is currently playing the passive role can take over the primary role only if the primary instance is not writing the latest timestamp details in the heartbeat file.

Updating Route 53 A record

The following AWS Command Line Interface (AWS CLI) commands show how to list the current value of an A record and update it to point to the IP address of the second EC2 instance:

aws route53 list-resource-record-sets --hosted-zone-id <ZONE ID>
aws route53 change-resource-record-sets --hosted-zone-id <ZONE ID> --change-batch file:///opt/oracle/gg/updatecname.json
updatecname.json
{
"Comment": "Update CNAME record",
"Changes": [
{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "ggaws.joejosp.com",
"Type": "A",
"TTL": 30,
"ResourceRecords": [
{
"Value": " 10.0.146.207"
}
]
}
}
]
}

This can be automated using one of the options discussed earlier.

Mount Amazon EFS automatically when Amazon EC2 boots up

You can mount an EFS filesystem automatically when EC2 instances come up using /etc/fstab entries. Refer to Mounting your Amazon EFS file system automatically.

Check the health of GoldenGate services and processes using REST APIs

These are code samples for checking the health of various GoldenGate processes using REST API, which can be executed using curl command.

curl -s -k -u $user:"$password" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-X GET https://<host>:3001/services/v2/replicats/<repname>/info/status | jq '.response'

Example output

{
"$schema": "ogg:replicatStatus",
"status": "running",
"processId": 8974,
"lastStarted": "2024-04-27T23:55:45.196Z",
"lag": 0,
"sinceLagReported": 6,
"position": {
"path": "/efs/gg/dep1/var/lib/data/",
"name": "rc",
"sequence": 13,
"offset": 1456
}
}

Check the health of the receiver path with the following code:

curl -s -k -u $user:"$password" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-X GET https://<host>:3003/services/v2/targets/<recvpathname>

Example output:

{
"name": "to_aws",
"description": "to_aws",
"status": "running",
"source": {
"uri": "wss://ggonprem.joejosp.com:3002/services/v2/sources?trail=ea",
"authenticationMethod": {
"certificate": "default"
}
},
"target": {
"uri": "trail://ggaws.joejosp.com:3003/services/v2/targets?trail=rc",
"details": {
"trail": {
"seqLength": 9,
"sizeMB": 500
},
"encryption": {
"algorithm": "NONE"
},
"compression": {
"enabled": false
}
}
},
"options": {
"eofDelayCSecs": 10,
"checkpointFrequency": 10,
"critical": false,
"autoRestart": {
"retries": 10,
"delay": 2
},
"streaming": true
},
"targetInitiated": true,
"begin": {
"sequence": 0,
"offset": 0
},
"$schema": "ogg:distPath"
}

Check the status of Service Manager and other services in the deployment:

curl -s -k -u  $user:"$password" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-X GET https://<host>:3000/services/v2/config/health

Example output

{"$schema":"ogg:health","deploymentName":"ServiceManager","serviceName":"ServiceManager","started":"2024-04-27T23:55:19.217Z","healthy":true,"criticalResources":[{"deploymentName":"dep1","name":"adminsrvr","type":"service","status":"running","healthy":true},{"deploymentName":"dep1","name":"distsrvr","type":"service","status":"running","healthy":true},{"deploymentName":"dep1","name":"recvsrvr","type":"service","status":"running","healthy":true}]

Start/Stop GoldenGate services and processes using REST APIs

The following sample code shows how to start or stop the Replicat and Extract processes using REST API.

curl -s -k -u $user:"$password"  \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-X POST https://<host>:3001/services/v2/commands/execute \
-d '{ "name":"start", "processName":"'<repname>'"}'
curl -s -k -u $user:"$password"  \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-X POST https://<host>:3001/services/v2/commands/execute \
-d '{ "name":"stop", "processName":"'<extname>'"}'

Example output:

[oracle@ip-10-0-146-207 gg]$ curl -s -k -u  admin:"admin123" \
> -H "Content-Type: application/json" \
> -H "Accept: application/json" \
> -X POST https://ggaws.joejosp.com:3001/services/v2/commands/execute \
> -d '{ "name":"stop", "processName":"'REP1'"}'
{"$schema":"api:standardResponse","links":[{"rel":"canonical","href":"https://ggaws.joejosp.com:3001/services/v2/commands/execute","mediaType":"application/json"},{"rel":"self","href":"https://ggaws.joejosp.com:3001/services/v2/commands/execute","mediaType":"application/json"}],"messages":[{"$schema":"ogg:message","title":"Sending STOP request to Replicat group REP1.","code":"OGG-08100","severity":"INFO","issued":"2024-04-28T00:58:56Z","type":"http://docs.oracle.com/goldengate/c2130/gg-winux/GMESG/oggus.htm#OGG-08100"},{"$schema":"ogg:message","title":"Replicat group REP1 is down (gracefully).","code":"OGG-02965","severity":"INFO","issued":"2024-04-28T00:58:56Z","type":"http://docs.oracle.com/goldengate/c2130/gg-winux/GMESG/oggus.htm#OGG-02965"}]

Use the following code to start or stop the receiver path:

curl -s -k -u $user:"$password"  \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{"status": "running"}' \
-X PATCH https://<host>:3003/services/v2/targets/<recvpathname>
curl -s -k -u $user:"$password"  \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{"status": "stopped"}' \
-X PATCH https://<host>:3003/services/v2/targets/<recvpathname>

Example output:

[oracle@ip-10-0-146-207 gg]$ curl -s -k -u  admin:"admin123"  \
> -H "Content-Type: application/json" \
> -H "Accept: application/json" \
> -d '{"status": "stopped"}' \
> -X PATCH https://<host>:3003/services/v2/targets/<pathname>
{"$schema":"api:standardResponse","links":[{"rel":"canonical","href":"https://ggaws.joejosp.com:3003/services/v2/targets/to_aws","mediaType":"application/json"},{"rel":"self","href":"https://ggaws.joejosp.com:3003/services/v2/targets/to_aws","mediaType":"application/json"}],"messages":[{"$schema":"ogg:message","title":"The path 'to_aws' has been stopped.","code":"OGG-08514","severity":"INFO","issued":"2024-04-28T01:05:07Z","type":"http://docs.oracle.com/goldengate/c2130/gg-winux/GMESG/oggus.htm#OGG-08514"}]}[oracle@ip-10-0-146-207 gg]$

Check the health of EC2 GoldenGate Hub instance

This code sample shows how to check the health of an EC2 instance using AWS CLI.

aws ec2 describe-instance-status --instance-ids <id>
{
"InstanceStatuses": [
{
"AvailabilityZone": "us-east-1b",
"InstanceId": "i-041fe9f5156d130da",
"InstanceState": {
"Code": 16,
"Name": "running"
},
"InstanceStatus": {
"Details": [
{
"Name": "reachability",
"Status": "passed"
}
],
"Status": "ok"
},
"SystemStatus": {
"Details": [
{
"Name": "reachability",
"Status": "passed"
}
],
"Status": "ok"
}
}
]
}

Failover decision tree

As discussed earlier, the decision to perform a failover to another EC2 GoldenGate Hub instance is based on various factors. The following decision tree shows a reference example.

Conclusion

In this post, we highlighted the importance of designing a high availability solution for the replication component and proposed a reference architecture with various patterns and example REST API commands that you can extend to build a fully automated, high availability solution for your GoldenGate MA deployment in AWS.

Share your feedback in the comments section.


About the Author

Jobin Joseph is a Senior Database Specialist Solution Architect based in Toronto. With a focus on relational database engines, he assists customers in migrating and modernizing their database workloads to AWS. He is an Oracle Certified Master with over 20 years of experience with Oracle databases.