AWS for SAP
Increase Availability of SAP Convergent Mediation using AWS Auto Scaling
Introduction
SAP Convergent Mediation (SAP CM) by DigitalRoute is a product in the SAP Billing and Revenue Innovation Management (SAP BRIM) solution. Customers deploy SAP CM on AWS to track and orchestrate their billing process. Further details and links can be found in the SAP Help portal – SAP Convergent Mediation by DigitalRoute.
There are two deployment types for an SAP BRIM solution depending on usage. The first deployment type is for offline/batch applications (e.g. batch billing mediation) where short periods of downtime are not disruptive to the business. The second deployment type is for real-time applications (e.g. online billing mediation) which require zero service interruptions. Customers who use SAP CM for a batch scenario will often use third party clustering solutions at the application tier to maintain higher degrees of availability in an automated fashion. While this is technically possible, it often increases their overall operational complexity as well as infrastructure support costs.
In this blog, we describe a cost effective and less complex approach to increase availability of an SAP CM platform server in an automated way using Amazon EC2 Auto Scaling. With this proposed solution, customers can minimize the unavailability of the platform server without the need for cluster management software on the application tier. This solution uses an Auto Scaling group (ASG) with a launch template to initiate a new platform server using a custom Amazon Machine Image (AMI). This design minimizes the installation footprint while also avoiding manual intervention in case a platform server experiences an outage. Optionally, customers can use an Amazon EventBridge rule and an AWS Systems Manager automation document to create an image before instance termination.
This blog provides guidance on how to set up a pilot to test resiliency for the platform container. Before using it in a production environment, additional development and tuning for your environment’s requirements are necessary. Also, for stateful, real-time scenarios in SAP CM which require session information to be persisted, a high availability deployment using external cluster management software will still be required, but will not be covered in this blog.
Overview
In SAP CM, the platform and execution containers are installed on separate hosts. Each container contains at least one pico process of type Platform, Execution context (EC), or Service context (SC). These pico instances are typically configured after the installation of the container. Platform and database host configurations provide storage and services that are essential to the mediation zone system. Execution servers provide scalable processing capacity in the system and redundancy is achieved by having multiple execution servers in different AWS Availability Zones.
The following diagram explains the high-level architecture for SAP Convergent Mediation. An AMI is taken from an existing platform server and invoked using a launch template along with user data. The user data script configures an Overlay IP address that is allocated to the platform server. Execution servers in SAP CM communicate with the platform server using this Overlay IP. In case of an issue in the platform container, the Application Load Balancer finds the web interface port (default 9000) of the platform container unreachable and reports the unhealthy status of the instance to the ASG. Due to the ASG’s target instance settings, it will terminate the faulty instance and launch a new platform server based on the configured AMI. The new instance then registers itself as a new target instance and the Application Load Balancer forwards the next request to the new instance. To troubleshoot the root cause of the failure, it is possible to take a backup of the instance before termination using a lifecycle hook, Amazon EventBridge rules and an AWS Systems Manager automation document.
The SAP CM web interface health check detects any anomalies with the platform pico process only. This doesn’t cover any additional service contexts (SCs) if manually configured to run on the platform instance.
Architecture Description
- Route 53 is a highly available and scalable Domain Name System (DNS) web service.
- Application Load Balancer (ALB) serves as the single point of contact for client connections, and routes the requests to the platform container.
- Auto Scaling Group helps to maintain Amazon EC2 instance availability.
- Amazon EFS is used for SAP CM storage shared across platform and execution containers
- Multiple execution containers in multiple AZ’s to increase redundancy. In case of a failure of an execution container, batches running in those execution containers need to be restarted manually.
- Pacemaker cluster for SAP HANA database high availability. This blog does not address the resilience requirements of the database layer, although in most cases a Pacemaker cluster is used for this purpose. Further details can be found on the following link: SAP HANA on AWS High Availability Configuration Guide for SLES and RHEL
- AWS Systems Manager automation document to trigger an AMI of the EC2 instance before termination
Prerequisites
- Installation of SAP CM is done as per the installation guide in the SAP Help Portal – SAP Convergent Mediation by DigitalRoute. In the example below, cmplat is the SAP CM platform container and cmexec1, cmexec2 are the SAP CM execution containers. The container name is mz01 for cmplat, ec01 for cmexec1 and ec02 for cmexec2.
- In Figure 2 we have the AWS Identity and Access Management (IAM) Policy assigned to the platform container that provides permission to update the route table. Replace the AWS Region, account number and route table details accordingly.
{ “Version”: “2012-10-17”, |
Figure 2: Enable access to update route table entries
Solution
The flow is as follows:
- Disable source/destination checks in the SAP CM platform server.
- Overlay IP is added to the IP configuration on the active SAP CM platform server.
- The Overlay IP in the route table has a destination defined as the ENI of the active SAP CM server.
- Modify the platform and execution container properties to point to the OIP.
- Take an AMI of the platform EC2 instance.
- Create Launch Template.
- Create Auto Scaling Group.
- Attach the existing platform EC2 instance to the ASG.
- Create a Target Group with health check HTTPS and health check path as /mz/main.
- Update the ASG with the Load Balancer target group.
- Create an Application Load Balancer with the target set as the target group created in Step 9.
- Create a lifecycle hook in the ASG and create an SSM document to take an AMI of the instance. Create an Amazon EventBridge rule and add an SSM document as an EventBridge rule target.
Disable source/destination checks in the SAP CM platform server
Disable source/destination checks in the SAP CM platform server. In the Amazon EC2 console, select the EC2 instance for the SAP CM platform, choose Actions, Networking, Change source/destination check.
Overlay IP & Route table update
One Overlay IP, which is an IP address that exists outside of the CIDR range of the VPC, is assigned to the cmplat server. In this case, 192.168.0.1 is assigned as the Overlay IP.
cmplat:~ # ip address add 192.168.0.1/32 dev eth0 cmplat:~ # ip a show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000 link/ether 02:01:60:61:b7:17 brd ff:ff:ff:ff:ff:ff inet 10.0.2.224/24 brd 10.0.2.255 scope global eth0 valid_lft forever preferred_lft forever inet 192.168.0.1/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::1:60ff:fe61:b717/64 scope link valid_lft forever preferred_lft forever cmplat:~ # |
Figure 4: Adding the overlay IP address to the cmplat server
Update the route table with the overlay IP and assigned target as the eni of the platform server
Modify the properties file
Modify the property pico.rcp.platform.host in {MZ_HOME}/common/config/cell/default/master/cell.conf as below.
MZ_HOME specifies the SAP CM software installation location and is shared across the platform and execution container servers.
pico.rcp.platform.host=”{chosen overlay IP}”
e.g. pico.rcp.platform.host=”192.168.0.1″
Modify the property “address” in {MZ_HOME}/common/config/cell/default/master/containers/mz01/container.conf.
“address” : “{chosen overlay IP}”
e.g. “address” : “192.168.0.1”
The property pico.rcp.server.host (for execution containers) in the below configuration file points to the local IP address of the respective execution container and doesn’t need any modification.{MZ_HOME}/common/config/cell/default/master/containers/ec01/container.conf
{MZ_HOME}/common/config/cell/default/master/containers/ec02/container.conf
pico.rcp.server.host=”{local IP address}”
Take AMI
Take an AMI of the platform container. AMI Name: cmplatimage
Create Launch Template
Create a launch template using the AMI taken in the previous step (cmplatimage). This launch template will be used for the Auto Scaling group in the next step. The user data section in the launch template is used to specify a configuration script that will run during launch which will take actions to add the Overlay IP, update the route table, and cleanly restart required platform services.
In the steps below, we create a launch template cmlt.
You can use the user data in Figure 9, in the launch template. This will add the Overlay IP to the IP configuration of the newly launched instance, update the route table with the ENI of the new instance and restart the platform instance.
#!/bin/bash -x hostnamectl set-hostname –static cmplat echo cmplat > /etc/hostname ip address add 192.168.0.1/32 dev eth0 TOKEN=`curl -X PUT “http://169.254.169.254/latest/api/token” -H “X-aws-ec2-metadata-token-ttl-seconds: 21600″` instance_id=$(curl -H “X-aws-ec2-metadata-token: $TOKEN” -s http://169.254.169.254/latest/meta-data/instance-id) aws ec2 –region us-east-1 modify-instance-attribute –instance-id=$instance_id –no-source-dest-check eni_id=$(aws ec2 –region us-east-1 describe-instances –instance-ids $instance_id –query ‘Reservations[*].Instances[*].NetworkInterfaces[*].{NetworkInterfaceId:NetworkInterfaceId}’ –output text) aws ec2 –region us-east-1 replace-route –route-table-id rtb-0cbf476881fca9021 –destination-cidr-block 192.168.0.1/32 –network-interface-id $eni_id if su -c ‘mzsh restart platform’ – mzadmin; then echo “Platform was started with rc 0” else echo “Platform was not started correctly in first attempt. Retrying” su -c ‘mzsh shutdown platform’ – mzadmin su -c ‘mzsh startup platform’ – mzadmin fi su -c ‘mzsh system start’ – mzadmin |
Figure 9: Script for user data of cmlt launch template
EC2 Auto Scaling Group
Create an EC2 Auto Scaling group (ASG) with desired capacity settings. In the below example we created an ASG named cmasg. Launch template cmlt is used to launch a new EC2 instance.
After the ASG is created, attach the existing platform instance (cmplat) to the ASG (cmasg).
The health check grace period is 300 seconds by default when an ASG is created. We suggest setting this to a minimum of 600 seconds to allow the new EC2 instance initialization to be completed. This can be updated in the ASG health check settings. This will prevent unnecessary termination of the platform instance.
Create Target Group
Create a target group (cmplatform-tg-9000) with port as HTTPS:9000 and Path as ‘/mz/main’ (SAP CM Web Interface path). Port 9000 is configured by the install.str.mz_platform parameter during the SAP CM installation.
Register the cmplat server as the target by using the instance ID and following this link Register and deregister targets by Instance ID.
Update ASG with Load Balancer target group
Attach the target group cmplatform-tg-9000 in the Load Balancer target groups for the cmasg ASG. Update the desired capacity to 1.
Create Load Balancer
Create the Application Load Balancer (cmplatform) with listener ports which forward the request to target group (cmplatform-tg-9000).
For encrypted communication, an SSL certificate is required. You can use AWS Certificate Manager (ACM) to provision, manage, and deploy public and private SSL/TLS certificates.
At this point, the cmplat instance has a healthy status in the target group.
End result testing
Based on the configuration performed in the previous steps, the ASG performs periodic health checks for the instances in the group and maintains the desired capacity in case of any outage of SAP CM platform.
In case of unavailability of an EC2 instance or issues with the platform pico process, the instance in the target group will be unhealthy, as shown in the screenshot below. Once the instance is in an unhealthy status in the ASG (cmasg), it initiates a new instance based on the launch template (cmlt).
In the testing below, we initiated a manual kill for the platform process, which turned the target health check status to “unhealthy” in the target group. The ASG then initiates the termination of the unhealthy instance and replaces it with a new instance.
The unhealthy status of the instance in the ASG initiates a new instance launch using the launch template and makes the platform instance available again. The new cmplat instance has health check status as initial and after the instance initialization is complete, the health check status turns to healthy.
After the new instance initialization is completed, the platform instance has a healthy status and the platform is available again without manual intervention.
Using AWS SSM to take backups of instances before ASG terminates it
In the event of an SAP CM platform unavailability, the ASG will terminate the EC2 instance and launch a new instance based on the AMI image. If you would like to keep a backup image of the instance before termination you can follow the blog Run code before terminating an EC2 Auto Scaling instance.
Following these steps will keep the troubled instance in the Terminating:Wait status for one hour by default (can be customized by changing heartbeat timeout in lifecycle hook settings). The EC2 Termination event from the ASG will trigger an SSM document which will take a backup of the EC2 instance and then send a termination signal to the ASG. This provides a way for troubleshooting and finding out the cause of the error at a later timeframe.
Conclusion
In this blog post, you’ve learned how to increase the availability of an SAP CM platform server using AWS ASG’s, without the complexity or licensing cost of running a third party cluster. You may use this procedure to gain higher availability and resiliency of an SAP CM system for batch convergent mediation scenarios.
You can find out more about DigitalRoute documentation in SAP Note 2924977. (SAP Support Portal login required)
Join the SAP on AWS Discussion
In addition to your customer account team and AWS Support channels, AWS provides public Question & Answer forums on our re:Post Site. Our SAP on AWS Solution Architecture team regularly monitor the SAP on AWS topic for discussion and questions that could be answered to assist you. If your question is not support-related, consider joining the discussion over at re:Post and adding to the community knowledge base.