AWS Architecture Blog
Enable transparent connectivity to Oracle Data Guard environments using Amazon Route 53 CNAME records
Customers choose AWS for running their Oracle database workload to help increase resiliency, performance, and scalability of the database layer. A high availability (HA) solution for the database stack is an important aspect to consider when migrating or deploying Oracle databases in AWS to help ensure that the architecture can meet the service level agreement (SLA) of the application. Customers who run their Oracle databases on Amazon Elastic Compute Cloud (Amazon EC2) commonly choose Oracle Data Guard physical standby databases to help meet the HA and disaster recovery (DR) for their Oracle database workloads.
As discussed in this Oracle documentation, role-based services with multiple listener endpoints in the connection URL or tnsnames.ora
entry is the preferred way to transparently connect to the database layer that is part of a Data Guard configuration. However, some application components and driver configurations don’t support multiple hostnames in the connection URL. Those applications require a single hostname or IP for the clients to connect to the Data Guard environment.
This post talks about the concept of using an Amazon Route 53 CNAME record in a Data Guard environment on EC2 and lists the artifacts to automatically route the connection between primary and standby environments in a Data Guard configuration based on the database role.
Solution overview
To help avoid the manual efforts to update DNS entries or tnsnames.ora
file after a failover or switchover operation in a Data Guard environment, the solution uses an AFTER DB_ROLE_CHANGE
trigger to automate the DNS failover process. This trigger runs a shell script on the database host, which in turn updates the CNAME record in Route 53 to point the CNAME records to reflect the role transition. The following diagram illustrates the solution architecture (Figure 1).
The solution discussed in this post covers routing new database connection requests to the right database post a Data Guard switchover activity. However, other factors such as application/client TTL settings and behavior of the connection pool to invalidate the connection handles created prior to the switchover activity can cause the application to connect to the database with a different role (like read-write workloads are connected to standby after switchover) and can generate errors, such as ORA-16000: database or pluggable database open for read-only access. It is a best practice to verify the database role before using the connection handles for transactions to verify that the application is connected to the database with the expected role.
The following workflow depicts the sequence of events that happens during a failover or switchover activity in a Data Guard environment to enable seamless connectivity for the application:
- A role transition event occurs in the Data Guard environment.
- The event triggers the
AFTER DB_ROLE_CHANGE
trigger. - The trigger runs the shell script on the EC2 instance using a scheduler job.
- The shell script updates Route 53 to point the CNAME records to reflect the role transition.
Prerequisites
This post assumes the following prerequisites:
- You should have an existing Data Guard configuration with one primary and one standby DB instance within a single VPC. Refer to the Oracle quick start template to deploy a Data Guard environment on Amazon EC2.
- The steps discussed here are for self-managed Data Guard configuration on Amazon EC2 with Red Hat Linux AMI.
- The scenario discussed in the post involves one primary and one standby database in the Data Guard configuration. For any other configurations, the scripts shown in this example require additional changes.
- A private or public Route 53 hosted zone should be configured in the VPC where the DB environment exists.
- The shell script uses the instance profile of the EC2 instance to run the AWS Command Line Interface (AWS CLI) commands. Make sure that the instance profile of the EC2 instances hosting the primary and standby databases has a policy attached that allows changing the record set in the hosted zone such as the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DBCnameFlipPloicy",
"Effect": "Allow",
"Action": [
"route53:ChangeResourceRecordSets",
"route53:ListResourceRecordSets"
],
"Resource": "arn:aws:route53:::hostedzone/<<YourHostedZoneId>>"
}
]
}
Nslookup
,jq
, andcurl
utilities must be installed on all of the DB hosts. If not installed, you can install the utility on RHEL Linux using the following command:
yum install -y bind-utils
yum install -y curl
yum install -y jq
Environment details
This post assumes a Data Guard configuration with two instances within a single VPC, one primary and one standby, with the following details and naming conventions:
- Oracle database version – 19.10 configured in maximum performance mode with Active Data Guard
- Route 53 domain name –
mydbdomain
- Database name –
orcl
- DB_UNIQUE_NAME –
orcl_a
andorcl_b
- Instance names –
orcl
- Route 53 A record for the host in AZ1 –
orcl-a-db.mydbdomain
- Route 53 A record for the host in AZ2 –
orcl-b-db.mydbdomain
Route 53 configuration
Two A records are created in Route 53 to point to the IPs of the primary and standby hosts. Two CNAME records are also created in Route 53, which are automatically updated during the Data Guard switchover and failover scenarios. The CNAME record orcl-rw.mydbdomain
points to the instance in the primary role that can accept read/write transactions, and orcl-ro.mydbdomain
points to the instance in the standby role that accepts read-only queries.
The A records configuration is as follows:
- DB host IP in AZ1 (10.0.0.5 in this example) –
orcl-a-db.mydbdomain
- DB host IP in AZ2 (10.0.32.5 in this example) –
orcl-b-db.mydbdomain
The CNAME records configuration is as follows:
- orcl-a-db.mydbdomain –
orcl-rw.mydbdomain
- orcl-b-db.mydbdomain –
orcl-ro.mydbdomain
The following screenshot shows the Route 53 console view of the domain mydbdomain
.
TNS configuration
The following tnsnames.ora
file entries show how connections can be made to primary and standby databases using the CNAME records without a dependency on the actual IP address of the EC2 instances that host primary and standby databases. The entry orcl_a
always points to the instance on orcl-a-db.mydbdomain
, and orcl_b
always points to the instance on orcl-b-db.mydbdomain
, regardless of their roles. The entries orclrw
and orclro
direct the connection to the databases playing primary and standby roles, respectively.
orcl_a =
(description =
(address = (protocol = tcp)(host = orcl-a-db.mydbdomain)(port = 1525))
(connect_data =
(server = dedicated)
(service_name = orcl_a)
)
)
orcl_b =
(description =
(address = (protocol = tcp)(host = orcl-b-db.mydbdomain)(port = 1525))
(connect_data =
(server = dedicated)
(service_name = orcl_b)
)
)
orclrw =
(description =
(address = (protocol = tcp)(host = orcl-rw.mydbdomain)(port = 1525))
(connect_data =
(server = dedicated)
(service_name = orcl)
)
)
orclro =
(description =
(address = (protocol = tcp)(host = orcl-ro.mydbdomain)(port = 1525))
(connect_data =
(server = dedicated)
(service_name = orcl)
)
)
To enable connectivity using orclrw
and orclro
TNS entries, you can use either a role-based service or a static listener registration entry in both the primary and standby listener, as shown in the following code:
SID_DESC =
(GLOBAL_DBNAME = orcl)
(ORACLE_HOME = /opt/oracle/product/19c/dbhome_1)
(SID_NAME = orcl)
)
Implement the solution
To implement an automated DNS update during an Oracle switchover or failover, we use an Oracle database trigger and a shell script. The following are the high-level steps for the entire workflow:
- Create a
DB_ROLE_CHANGE ON DATABASE
trigger on the primary database - The trigger in turn creates a DBMS job that calls a shell script with the
cname_switch.sh
. - The shell script updates the Route 53 CNAME entries.
Database trigger
Use the following code for the database trigger:
CREATE OR REPLACE TRIGGER sys.cname_flip_post_role_change
AFTER DB_ROLE_CHANGE ON DATABASE
DECLARE
v_db_name VARCHAR2(9);
v_db_role VARCHAR2(16);
BEGIN
SELECT DATABASE_ROLE INTO v_db_role FROM V$DATABASE;
SELECT DB_UNIQUE_NAME INTO v_db_name FROM V$DATABASE;
IF v_db_role = 'PRIMARY' THEN
BEGIN
dbms_scheduler.drop_job('RW_CNAME_FLIP');
EXCEPTION
WHEN OTHERS THEN NULL;
END;
dbms_scheduler.create_job(
job_name => 'RW_CNAME_FLIP',
job_type => 'EXECUTABLE',
number_of_arguments => 1,
job_action => '/home/oracle/admin/bin/cname_switch.sh',
enabled => false,
auto_drop => true);
dbms_scheduler.set_job_argument_value(
job_name => 'RW_CNAME_FLIP',
argument_position => 1,
argument_value => v_db_name);
BEGIN
dbms_scheduler.run_job('RW_CNAME_FLIP');
EXCEPTION
WHEN OTHERS THEN
raise_application_error(-20101, 'CNAME flip failed, check script error');
END;
END IF;
EXCEPTION
WHEN OTHERS THEN
raise_application_error(-20102, 'CNAME flip failed due to error: ' || SQLERR
M);
END;
/
Shell script
This script determines the current CNAME, identifies the dependent A records, and maps the CNAME to the correct A records accordingly. This shell script is provided for reference assuming the naming conventions for db_name
and db_unique_name
as used in the sample configuration. You should review and modify the script to meet your specific requirements and organization standards.
As per the example shown earlier, the shell script is placed in the location /home/oracle/admin/bin/cname_switch.sh
.
Note: it’s common to see production databases that are restored or cloned to lower environments.
If the script is run in those environments, it can potentially change the CNAME entries unexpectedly. To mitigate this, the shell script has the function restore_safeguard
. This function checks that the IP assigned to the EC2 instance is actually matching with the A records configured for this database in Route 53. If no match is found, this will not perform CNAME failover.
#! /bin/bash
#set -x
# Variables may need to be changed to suit your environment
DB_NAME=$1
DB_IN=$1
echo "Orginal Input : ${DB_NAME}"
DB_NAME=`echo "${DB_NAME::-2}"` # removing last 2 characters from DB_UNIQUE_NAME
DB_NAME=`echo "${DB_NAME}" | tr '[:upper:]' '[:lower:]'`
echo "Modified Input : ${DB_NAME}"
DB_DOMAIN=<<YOUR_AWS_ROUTE53_DOMAIN_NAME>> # Update as per your AWS Route53 domian name
ZONE_ID=<<YOUR_AWS_ROUTE53_HOSTED_ZONE_ID>> # Update as per your AWS Route53 hosted zone ID
EC2_METADATA='http://169.254.169.254/latest/dynamic/instance-identity/document'
# CNAME and A-Records related varables :
RW_CNAME=`echo "${DB_NAME}-rw.${DB_DOMAIN}"`
RO_CNAME=`echo "${DB_NAME}-ro.${DB_DOMAIN}"`
A_CNAME=`echo "${DB_NAME}-a-db.${DB_DOMAIN}"`
B_CNAME=`echo "${DB_NAME}-b-db.${DB_DOMAIN}"`
REGION=`curl -s ${EC2_METADATA}|grep region|awk -F\" '{print $4}'`
# Logfile configuration and file initilization
TS=`date +%Y%m%d_%H%M%S`
LOG_DIR=/tmp
CHANGE_SET_FILE=`echo "${LOG_DIR}/${DB_NAME}-CnameFlip-${TS}.json"`
LOG_FILE=`echo "${LOG_DIR}/${DB_NAME}-CnameFlip-${TS}.log"`
CONF_FILE=`echo "file://${CHANGE_SET_FILE}"`
# Function to check if current host IP matching with Route 53 configuration
IS_SAFE='Unsafe'
function restore_safeguard()
{
AWS_TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
LOCAL_IPV4=`curl -sH "X-aws-ec2-metadata-token: $AWS_TOKEN" -v http://169.254.169.254/latest/meta-data/local-ipv4`
PUBLIC_IPV4=`curl -sH "X-aws-ec2-metadata-token: $AWS_TOKEN" -v http://169.254.169.254/latest/meta-data/public-ipv4`
NOT_FOUND=`echo ${PUBLIC_IPV4} | grep '404 - Not Found' | wc -l`
if [ ${NOT_FOUND} == 1 ]; then
PUBLIC_IPV4='No Public IP Assigned'
fi
A_IP=$(aws route53 list-resource-record-sets --hosted-zone-id ${ZONE_ID} \
--query 'ResourceRecordSets[?Type==`A`].{Name: Name, Value:ResourceRecords[0].Value}' | \
jq -cr --arg DB_NAME "${DB_NAME}-a" '.[] | select( .Name | contains($DB_NAME)).Value')
B_IP=$(aws route53 list-resource-record-sets --hosted-zone-id ${ZONE_ID} \
--query 'ResourceRecordSets[?Type==`A`].{Name: Name, Value:ResourceRecords[0].Value}' | \
jq -cr --arg DB_NAME "${DB_NAME}-b" '.[] | select( .Name | contains($DB_NAME)).Value')
PREVIOUS_RW_ID=$(aws route53 list-resource-record-sets --hosted-zone-id ${ZONE_ID} \
--query 'ResourceRecordSets[?Type==`CNAME`].{Name: Name, Value:ResourceRecords[0].Value}' | \
jq -cr --arg DB_NAME "${DB_NAME}-rw" '.[] | select( .Name | contains($DB_NAME)).Value' | cut -d'-' -f2)
if [ ${PREVIOUS_RW_ID} == 'a' ]; then
RW_NODE_IP=${A_IP}
RO_NODE_IP=${B_IP}
else
RW_NODE_IP=${B_IP}
RO_NODE_IP=${A_IP}
fi
# Looging Input values
echo "Orginal Input : ${DB_IN}" | tee -a ${LOG_FILE}
echo "Modified Input : ${DB_NAME}" | tee -a ${LOG_FILE}
echo "Current RW ID : ${PREVIOUS_RW_ID}" | tee -a ${LOG_FILE}
echo "Host Private IP : ${LOCAL_IPV4}" | tee -a ${LOG_FILE}
echo "Host Public IP : ${PUBLIC_IPV4}" | tee -a ${LOG_FILE}
echo "A Node IP : ${A_IP}" | tee -a ${LOG_FILE}
echo "A Node IP : ${B_IP}" | tee -a ${LOG_FILE}
echo "RW Node IP : ${RW_NODE_IP}" | tee -a ${LOG_FILE}
echo "RO Node IP : ${RO_NODE_IP}" | tee -a ${LOG_FILE}
if [ "${LOCAL_IPV4}" == "${RO_NODE_IP}" -o "${PUBLIC_IPV4}" == "${RO_NODE_IP}" ]; then
IS_SAFE='Safe'
else
IS_SAFE='Unsafe'
fi
}
restore_safeguard
if [ ${IS_SAFE} == 'Safe' ]; then
echo "Safe for CNAME faliover..." | tee -a ${LOG_FILE}
else
echo "Unsafe for CNAME faliover..." | tee -a ${LOG_FILE}
echo "Aborting..."
exit 1
fi
PRI_DB_ID=`nslookup ${RW_CNAME}|grep "canonical name"|cut -d'=' -f2|cut -d'-' -f2`
# Looging Input values :
echo "Orginal Input : ${DB_IN}" | tee ${LOG_FILE}
echo "Modified Input : ${DB_NAME}" | tee -a ${LOG_FILE}
echo "Current RW host ID : ${PRI_DB_ID}" | tee -a ${LOG_FILE}
echo -e "\nChange to be done : \n" | tee -a ${LOG_FILE}
if [ ${PRI_DB_ID} == 'a' ]; then
echo "Changing ${RW_CNAME} from ${A_CNAME} to ${B_CNAME}" | tee -a ${LOG_FILE}
echo "Changing ${RO_CNAME} from ${B_CNAME} to ${A_CNAME}" | tee -a ${LOG_FILE}
TO_BE_RW_CNAME=${B_CNAME}
TO_BE_RO_CNAME=${A_CNAME}
else
echo "Changing ${RW_CNAME} from ${B_CNAME} to ${A_CNAME}" | tee -a ${LOG_FILE}
echo "Changing ${RO_CNAME} from ${A_CNAME} to ${B_CNAME}" | tee -a ${LOG_FILE}
TO_BE_RW_CNAME=${A_CNAME}
TO_BE_RO_CNAME=${B_CNAME}
fi
R53_CHANGE=`echo -e "
{
\"Comment\": \"Flip CNAMEs\",
\"Changes\": [
{
\"Action\" : \"UPSERT\",
\"ResourceRecordSet\" : {
\"Name\" : \"${RW_CNAME}.\",
\"Type\" : \"CNAME\",
\"TTL\" : 60,
\"ResourceRecords\" : [{ \"Value\": \"${TO_BE_RW_CNAME}.\" }]
}
},
{
\"Action\" : \"UPSERT\",
\"ResourceRecordSet\" : {
\"Name\" : \"${RO_CNAME}\",
\"Type\" : \"CNAME\",
\"TTL\" : 60,
\"ResourceRecords\" : [{ \"Value\": \"${TO_BE_RO_CNAME}.\" }]
}
}
]
}
"`
echo -e "\nRoute53 Change Set :\n" | tee -a ${LOG_FILE}
echo ${R53_CHANGE} | tee -a ${LOG_FILE}
echo ${R53_CHANGE} > ${CHANGE_SET_FILE}
echo -e "\nCommand to Execute : " | tee -a ${LOG_FILE}
echo -e "\naws route53 change-resource-record-sets --hosted-zone-id ${ZONE_ID} \
--change-batch ${CONF_FILE} \n" | tee -a ${LOG_FILE}
echo -e "\nExecution Result :\n"
aws route53 change-resource-record-sets --hosted-zone-id ${ZONE_ID} \
--change-batch ${CONF_FILE} | tee -a ${LOG_FILE}
echo -e "\nAfter Change :\n "
aws route53 list-resource-record-sets --hosted-zone-id ${ZONE_ID} | tee -a ${LOG_FILE}
Test the solution
The following screenshot shows the Route 53 console view of the domain mydbdomain
before the switchover. The primary database is running on orcl-a-db.mydomain
because orcl-rw.mydomain
is pointing to that.
The following SQL displays the current role of both primary and standby databases and host_name they are currently running on.
[oracle@ip-10-0-0-5 sql]$ cat db_info.sql
ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD:HH24:MI';
set lines 150 pages 200
col HOST_NAME for a30 trunc
select d.NAME, d.db_unique_name, d.DATABASE_ROLE, d.OPEN_MODE, i.INSTANCE_NAME,
i.HOST_NAME, i.STARTUP_TIME
from v$instance i, v$database d;
[oracle@ip-10-0-0-5 sql]$ sqlplus system@orclrw
SQL> @db_info
NAME DB_UNIQUE_NAME DATABASE_ROLE OPEN_MODE INSTANCE_NAME HOST_NAME STARTUP_TIME
------ ---------------- -------------- ---------------- ------------------------------ ----------------
ORCL orcl_a PRIMARY READ WRITE orcl ip-10-0-0-5.us-west-2.compute. 2020-05-24:01:47
[oracle@ip-10-0-0-5 sql]$ sqlplus system@orclro
SQL> @db_info
NAME DB_UNIQUE_NAME DATABASE_ROLE OPEN_MODE INSTANCE_NAME HOST_NAME STARTUP_TIME
------ ---------------- -------------------- -------------- ------------------------------- ----------------
ORCL orcl_b PHYSICAL STANDBY READ ONLY WITH APPLY orcl ip-10-0-32-5.us-west-2.compute. 2020-05-24:05:50
Let’s initiate the switchover:
[oracle@ip-10-0-0-5 sql]$ dgmgrl /
DGMGRL for Linux: Release 12.2.0.1.0 - Production on Wed May 27 06:42:51 2020
Copyright (c) 1982, 2017, Oracle and/or its affiliates. All rights reserved.
Welcome to DGMGRL, type "help" for information.
Connected to "orcl_a"
Connected as SYSDG.
DGMGRL> show configuration;
Configuration - awsguard
Protection Mode: MaxPerformance
Members:
orcl_a - Primary database
orcl_b - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS (status updated 39 seconds ago)
DGMGRL> switchover to orcl_b;
Performing switchover NOW, please wait...
Operation requires a connection to database "orcl_b"
Connecting ...
Connected to "orcl_b"
Connected as SYSDBA.
New primary database "orcl_b" is opening...
Oracle Clusterware is restarting database "orcl_a" ...
Switchover succeeded, new primary is "orcl_b"
DGMGRL>
DGMGRL> show configuration;
Configuration - awsguard
Protection Mode: MaxPerformance
Members:
orcl_b - Primary database
orcl_a - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS (status updated 67 seconds ago)
DGMGRL>
Now that the switchover is complete, let’s connect to the database using the orclrw
and orclro
TNS entries using the following code:
[oracle@ip-10-0-0-5 sql]$ sqlplus system@orclrw
SQL> @db_info
NAME DB_UNIQUE_NAME DATABASE_ROLE OPEN_MODE INSTANCE_NAME HOST_NAME STARTUP_TIME
----- -------------- ------------- -------------- ------------------------------ ----------------
ORCL orcl_b PRIMARY READ WRITE orcl ip-10-0-32-5.us-west-2.compute 2020-05-24:05:50
[oracle@ip-10-0-0-5 sql]$ sqlplus system@orclro
SQL> @db_info
NAME DATABASE_ROLE OPEN_MODE INSTANCE_NAME HOST_NAME STARTUP_TIME
----- ----------------- -------------------- -------------- ------------------------------ ----------------
ORCL orcl_a PHYSICAL STANDBY READ ONLY WITH APPLY orcl ip-10-0-0-5.us-west-2.compute. 2020-05-27:06:43
The following screenshot shows the Route 53 console view of the domain mydbdomain after the switchover. The primary database is now running on orcl-b-db.mydomain
because orcl-rw.mydomain
is pointing to that.
Conclusion
Application connectivity to a Data Guard environment can be challenging, especially when the application configuration doesn’t support multiple hostnames or listener endpoints. In this post, we discussed step-by-step details to enable seamless connectivity to Data Guard environments using Route 53 CNAME records, a database trigger, and a shell script. You can use these artifacts to direct the DB connections to the database with the right role seamlessly without application changes. If you are using Data Guard Observer for automated failover, another blog, Setup a high availability design for Oracle Data Guard (Fast-Start Failover) using Amazon Route 53 discusses an alternate mechanism to achieve the same result.