AWS Database Blog

How ktown4u built a custom auto scaling architecture using an Amazon Aurora mixed-configuration cluster to respond to sudden traffic spikes

Ktown4u is an e-commerce company that sells fast-growing K-Pop and Hallyu (Korean-Wave) products all over the globe. Since its launch in 2002, the company has grown to over 5 million members and 5,000 fan clubs across 200 countries with 6 languages. Since transitioning from B2B to B2C, the company has grown 15-fold in a short period of time, with sales exceeding 200 billion won in 2021.

Ktown4u develops and operates systems such as user services, operating systems, and logistics systems (WMS), all of which are cloud-based. The company resolved generic bugs and made improvements in 2022 to handle traffic spikes. As of 2023, the company is currently working on improving product structure, upgrading logistics, and replacing the giant monolithic legacy system with an MSA (Micro Service Architecture) structure.

In this post, we share how ktown4u built a custom auto scaling architecture using an Amazon Aurora mixed-configuration cluster to respond to sudden traffic spikes.

Challenges on instantaneous traffic spike as opposed to making it about Aurora Auto Scaling

When marketing events occur, such as the release or promotion of a new album by a famous K-pop artist, ktown4u handles tens of times the usual amount of traffic. Despite the active utilization of Amazon ElastiCache to curtail latency and uphold service dependability, several APIs necessitating real-time database queries exert excessive strain on the database infrastructure. In order to mitigate the burden on the database stemming from unforeseeable occurrences, ktown4u actively leverages Aurora Auto Scaling.

During such events, ktown4u undertakes measures to expand the number of Aurora replicas through the application of Aurora Auto Scaling or proactively augments multiple read-only instances in anticipating traffic surges. Notwithstanding these efforts, the company encounters certain constraints, particularly in the following scenarios:

  • Instances of service disruptions resulting from abrupt and unpredictable spikes in traffic load.
  • Detection and integration of read-only instances in the Auto Scaling process, a procedure taking up to 5-6 minutes following a scale-out event.
  • The financial implications of provisioning read-only instances, necessitating planning hours in advance, contingent upon the nature of the business.
  • Challenges associated with accurately gauging the scale of an impending event, leading to excessive reactions and wasting resources.

To meet these business imperatives, effectively address erratic traffic fluctuations, and simultaneously streamline the temporal and fiscal investments associated with Aurora Auto Scaling, a solution meticulously tailored to ktown4u’s unique circumstances was imperative.

Improvement ideas

If an additional Amazon Aurora instance could handle traffic during the approximately 5-6 minutes it takes to create a read-only instance, you could solve the service failure issue. An effective solution to this is by having an additional instance that handles traffic only during the period that read-only instances are created, and are only charged when they receive traffic. This configuration would also enable cost-optimization by reducing the number of read-only instances you spin up prior to responding to the event. The Aurora metrics available in Amazon CloudWatch have a resolution of 1 minute. Higher resolution metrics are needed for fast detection of read-only instance loads caused by traffic spikes.

This idea can be summarized as follows:

  • High-resolution metrics to address database load
  • Additional instance on a pay-as-you go basis
  • Traffic distribution to an additional instance when traffic exceeds a specific threshold
  • An automated operations architecture

Amazon Aurora hybrid (mixed-configuration) clusters

Amazon Aurora DB clusters are configured with two types of DB instances:

  • Primary DB instance – Supports read and write operations, and runs data modifications on the cluster volume
  • Aurora replica (read-only instance) – Connects to the same storage volume as the primary DB instance and supports read operations only

One of the key architectural features of the Amazon Aurora database is the separation of compute and storage. Aurora storage automatically grows and shrinks based on the amount of data in the database.

This feature of the Amazon Aurora database architecture makes it possible to configure a mixed-configuration cluster by adding Serverless v2 instances to a provisioned cluster. Amazon Aurora Serverless v2 automatically scales CPU and memory resources without disrupting operations. The database capacity for Aurora Serverless v2 is measured in Aurora Capacity Units (ACUs) and it is billed per second. Each ACU is a combination of approximately 2 GB of memory, and corresponding CPU, and networking. The minimum Aurora Serverless v2 capacity that you can define is 0.5 ACUs and the maximum is 128 ACUs.

You can add an Aurora Serverless v2 read-only instance to a previously provisioned Aurora cluster, creating an Aurora mixed-configuration cluster as shown in the following figure. Ktown4u calls this database architecture “the Aurora hybrid cluster.”

Solution overview

The custom Aurora Auto Scaling architecture handles database load with the following logic:

  1. An Aurora Serverless v2 read-only instance previously added to a provisioned Aurora cluster has a zero (0) weight. Due to this, the Aurora Serverless v2 read-only instance doesn’t receive traffic during normal times.
  2. If the average CPU of the read-only instance of an existing provisioned Aurora cluster exceeds the threshold, an alarm is generated.
  3. When the alarm occurs, weights are adjusted to forward some traffic to an Amazon Serverless v2 read-only instance.
    1. After adjusting the weights, continue to check the alarm status to determine if additional traffic redirection is required.
    2. If the alarm is not resolved and the traffic is under heavy load, further adjust the weights so that an Amazon Serverless v2 read-only instance can handle more traffic.
  4. The triggered alarm scales out the existing Aurora cluster and adds a provisioned read-only instance.
  5. When the status of the additional provisioned read-only instance becomes available, monitor the status of the alarm and adjust the weights so that the traffic can be handled by the read-only instance.
  6. When the alarm is finally resolved, adjust the weight of the Aurora Serverless v2 read-only instance to zero so that it no longer receives traffic.

As a result of these ideas, the following architectural changes were implemented to provide reliable and efficient database usage.

Implementing this architecture consists of the following high-level steps:

  1. Add an Aurora Serverless v2 read-only instance and configure a custom endpoint.
  2. Collect enhanced metrics and create CloudWatch custom metrics.
  3. Adjust the weight of Amazon Route 53 weight-based records using AWS Step Functions.
  4. Set up Aurora Auto Scaling with CloudWatch custom metrics.

Add an Amazon Aurora Serverless v2 read-only instance and configure a custom endpoint

The first step is to add an Amazon Aurora Serverless v2 read-only instance to an existing Aurora cluster to handle sudden spikes in traffic. Configure the Aurora Serverless v2 read-only instance with the lowest priority 15 to so it won’t be promoted to a writer instance in the event of a failover. By creating multiple Aurora Serverless v2 read-only instances, it’s possible to handle traffic spikes that are generic extreme. The following diagram illustrates this architecture.

Set up each custom endpoint as follows:

  • Provisioned custom endpoint – An endpoint that points to a group of provisioned read-only instances that belong to a provisioned Aurora cluster to handle normal traffic
  • Aurora Serverless v2 custom endpoints – An endpoint that points to an Aurora Serverless v2 read-only instance to handle load traffic above a threshold

Set the auto scaling settings so that the provisioned instance is only added to the provisioned custom endpoints.

Each of these custom endpoints creates a domain in Route 53 as a private hosted zone, and a weighted based routing record with the same subdomain. Set the weight value of the weighted based record associated with the Aurora Serverless v2 custom endpoint to zero.

With this process, the normal traffic that can be handled by a configured DNS (Domain Name System) address is handled by an instance connected to a provisioned custom endpoint. Load traffic above a threshold is handled by an Aurora Serverless v2 read-only instance, increasing the reliability of the service at minimal cost.

Collect enhanced metrics and create CloudWatch custom metrics

The following diagram illustrates the architecture to collect and create metrics. It consists of three key steps: collecting high-resolution CloudWatch metrics, refining metrics with a subscription filter, and using embedded metric format (EMF) to apply real-time metrics.

Collect high-resolution CloudWatch metrics

By default, Aurora automatically sends metric data to CloudWatch every 60 seconds. Higher-resolution metrics are required to detect sudden changes in database CPU utilization caused by traffic spikes.

When you use Enhanced Monitoring in Amazon Relational Database Service (Amazon RDS), you can view real-time metrics of an operating system that your database runs on with a resolution of up to 1 second. Enhanced Monitoring is enabled by modifying an instance in the Aurora cluster. Instance created by auto scaling share the settings of the primary instance, so be sure to enable Enhanced Monitoring on the primary instance as well.

When the Enhanced Monitoring feature is enabled, the metrics are stored in a CloudWatch Logs group named RDSOSMetrics. Ktown4u configured the high-resolution metrics from the metrics extracted from these logs.

Refine metrics with a subscription filter

A CloudWatch Logs subscription filter is a feature that subscribes information received as logs in CloudWatch and sends it to services such as AWS Lambda, Amazon Kinesis, and more. The RDSOSMetrics log group collects metrics from the instances with Enhanced Monitoring enabled, so we have refined only the instance metrics that are targeted by the provisioned custom endpoint set as shown earlier using the subscription filter.

The following python code is an example of processing the compressed logs sent to the Lambda event to make it readable:

import json
import base64
import gzip

def lambda_handler(event, context, metrics):
        data = event["awslogs"]["data"]
        decoded_data = base64.b64decode(data)
        decompressed_data = gzip.decompress(decoded_data)
        metric = json.loads(json.loads(decompressed_data)["logEvents"][0]["message"])

Next, it extracts only the metrics of the read-only instance that correspond to each of the custom endpoints we set up in the previous step. However, the metrics can’t be extracted based on custom endpoints. Therefore, we used the describe_db_instance and describe_db_clusters APIs of Amazon RDS provided by AWS Boto3 to extract the CPU usage metrics of read-only instance that are in the available state:

instance_with_status = rds.describe_db_instance(
        Filters=[{"Name": "db-cluster-id", "Values": ["<CLUSTER_NAME>"]}]
    )["DBInstances"]

instance_with_primary = rds.describe_db_clusters(DBClusterIdentifier="<CLUSTER_NAME>")[
    "DBClusters"
][0]["DBClusterMembers"]

instance = []
for i in instance_with_status:
    for c in instance_with_primary:
        if i["DBInstanceIdentifier"] == c["DBInstanceIdentifier"]:
            instance.append(dict(i, **c))
            break

available_provisioned_read_replicas = [
    instance["DBInstanceIdentifier"]
    for instance in instance
    if instance["DBInstanceStatus"] in available_status
    and instance["DBInstanceClass"] != "db.serverless"
    and not instance["IsClusterWriter"]

Use EMF to apply real-time metrics

You can use EMF for fast metric ingestion. EMF is a structured log format customized for CloudWatch, which has the advantage of quick metric ingestion and can be used when real-time metrics (such as failure response) are required. For Lambda using Python as a runtime, it’s straightforward to implement using the metric_scope decorator.

from aws_embedded_metrics import metric_scope
from aws_embedded_metrics.storage_resolution import StorageResolution

@metric_scope
def lambda_handler(event, context, metrics):
        #    ... metric data Extract / Convert ...
        metrics.set_dimensions("<METRIC_DIMENSION_NAME>": "<METRIC_DIMENSION_VALUE>"})
        metrics.put_metric("<METRIC_NAME>", "<METRIC_VALUE>", ageResolution.HIGH)
        metrics.set_namespace("<METRIC_NAMESPACE>")

With EMF, you can view the value of your refined custom metric (CPU usage of the target instance) in CloudWatch in less than 1 second.

Adjust the weight of Route 53 weight-based records using Step Functions

The custom Aurora Auto Scaling architecture is configured to generate alarm events to handle high loads on the database and invoke Step Functions to adjust the traffic ratio (record weights on Route 53) between the provisioned read-only instance and the Aurora Serverless v2 read-only instance. The following diagram illustrates this workflow, which consists of two key steps: configuring read-only instance load alarms and event triggers, and configuring a traffic ratio using Step Functions.

Ktown4u has separately managed Step Functions to handle increasing and decreasing database load. The configuration described below is based on Step Functions occurring under increased load.

Configure read-only instance load alarms and event triggers

Create a CloudWatch alarm using the custom metrics you created earlier. System reliability can be increased by setting fewer data points for a high database load and more data points for a low database load. You can increase the alarm data point frequency by up to 10 seconds by using custom metrics rather than AWS namespace metrics.

Configure a traffic ratio using Step Functions

The CloudWatch alarm event occurs only when the alarm changes its state from OK to Alarm for the first time or the other way around. If traffic continues to increase gradually even after a spike, a single Route 53 weight adjustment is not a sufficient solution to ensure stability. After the weight adjustment, it is recommended to continuously check the status of the database and use Step Functions to verify stability. The following diagram illustrates the Step Functions workflow.

Step Functions triggered by an alarm performs the following actions:

  1. Modify the weights of the records associated with the Aurora Serverless v2 read-only instance on Route 53.
  2. Wait at least the duration of the CloudWatch alarm’s data point cycle so that the adjusted weights can be reflected to the actual database usage and the CloudWatch alarm.
  3. Verify whether the CloudWatch alarm is resolved.
    1. If the alarm is resolved, exit the Step Functions state machine.
    2. If the alarm is not resolved, adjust the weight of the Route 53 record again.

Instead of using separate Lambda functions, the built-in functions in Step Functions were used. By using built-in functions, ktown4u was able to reduce Lambda usage time and optimize cost.

Set up Aurora Auto Scaling with CloudWatch custom metrics

The auto scaling feature provided by the Aurora cluster works based on an alarm about the average CPU usage of the read-only instances included in the Aurora cluster, as shown in the following screenshot.

CPU usage for an Aurora Serverless v2 instance is measured by the maximum Aurora Capacity Units (ACUs) in use. ACUs also scale automatically up to 128 ACUs. Therefore, if you configure an Aurora mixed-configuration cluster, the average CPU usage of the read-only instances in the Aurora cluster is calculated to be lower than the average CPU of the provisioned read-only instance, preventing auto-scaling from occurring based on the desired CPU Usage.

Therefore, we have configured custom metrics to verify that auto scaling occurs based on the actual resources being used through the average CPU usage of the provisioned read-only instance only. The following diagram illustrates the workflow.

To configure auto-scaling on the Aurora cluster with custom metrics, you need to use the AWS Auto Scaling API. AWS Auto Scaling is designed to implement auto scaling for AWS services such as Amazon DynamoDB, Amazon Elastic Container Service (Amazon ECS), Aurora, and Amazon EMR. The advantage of using AWS Auto Scaling is that it allows users to configure scaling policies with separate metrics instead of the auto scaling baseline metrics such as “average CPU usage for read-only instance” and “average connections for read-only instance” supported by Aurora clusters.

We have used the AWS Command Line Interface (AWS CLI) and the following custom metrics to configure auto scaling on the Aurora cluster.

  1. Register a target to be used for auto scaling and control the minimum and maximum capacity:
    aws application-autoscaling register-scalable-target \
      --service-namespace rds \
      --scalable-dimension rds:cluster:ReadReplicaCount \
      --resource-id cluster:"<CLUSTER_NAME>" \
      --min-capacity "<MIN_CAPACITY>" \
      --max-capacity "<MAX_CAPACITY>"
  2. Determine which metrics and policies to use for auto scaling:
    aws application-autoscaling put-scaling-policy \  
    --policy-name "<POLICY_NAME>" \
      --service-namespace rds \
      --resource-id cluster:"<CLUSTER_NAME>" \
      --scalable-dimension rds:cluster:ReadReplicaCount \
      --policy-type TargetTrackingScaling \
      --target-tracking-scaling-policy-configuration file://metric.json

Ktown4u utilized a custom metric of seconds so that auto scaling could be applied quickly. The policy type is TargetTrackingScaling, and it’s configured to handle common load situations. TargetTrackingScaling is a policy designed to increase read replicas at a moderate level based on current CPU usage. Depending on the service workload characteristics, you can set the policy to increase or decrease by a specified size using StepScaling.

The metric.json file was constructed using the high-resolution custom metrics created in the previous step:

{
  "TargetValue": "<TARGET_VALUE>",
  "CustomizedMetricSpecification": {
    "MetricName": "<METRIC_NAME>",
    "Namespace": "<METRIC_NAMESPACE>",
    "Dimensions": [
      {
        "Name": "<DIMENSION_NAME>",
        "Value": "<DIMENSION_VALUE>"
      }
    ],
    "Statistic": "Average",
    "Unit": "Percent"
  },
...
}
With an Aurora mixed-configuration architecture that uses an Aurora Serverless v2 read-only instance, it’s possible to run a higher traffic service in a reliable manner. Taking advantage of this, ktown4u was able to configure a higher targetValue than the one they had configured for auto scaling on their existing provisioned Aurora cluster.

Validate the custom Aurora Auto Scaling architecture

To validate the solution architecture, we used the following test environment:

  • Aurora mixed-configuration Cluster
    • Primary
      • Provisioned instance – db.r5.4xlarge(16 vCPU) x 1ea
    • replica
      • Provisioned instance – db.r5.4xlarge(16 vCPU) x 1ea
      • Serverless v2 instance – ACU 0.5~128 x 1ea
  • Scale-out threshold: 40% (Average CPU percentage)
  • Provisioned/Serverless traffic switching threshold: 60% (Average CPU percentage)
  • Aurora Serverless v2 read-only instance traffic switching ratio: 20% (based on provisioned instance)

The test consisted of the following steps:

  1. Load the database with a spike in traffic right after the test starts.
  2. when the threshold exceeds 40, a scale-out occurs.
  3. Start shifting traffic to serverless as the threshold exceeds 60.
  4. Fine-tune the traffic ratio between provisioned and serverless endpoints.
  5. Converge to CPU bandwidth between scale-out and traffic shifting high threshold.
  6. Read replica addition reduces CPU load on provisioned instance.
  7. Repeat the preceding process.
  8. End of test.

The following chart shows the average CPU utilization for a cluster consisting only of previously provisioned Aurora read-only instances.

In this test, 100% CPU usage was reached and a service failure occurred for about 10 minutes due to the large amount of traffic that occurred immediately after the test started.

Compared to the previous Aurora cluster configuration, which maintained 100% CPU usage for about 10 minutes during the scale-out, the Aurora mixed-configuration cluster detected a sudden increase in CPU load and switched traffic to the Aurora Serverless v2 read-only instance to maintain stable CPU usage, as shown in the following chart.

After the scale-out was caused by auto scaling, the percentage of traffic converted to Aurora Serverless v2 read-only instance gradually reduces, and after the scale-out is complete, traffic is converted to the provisioned instance.

Observed improvements

With the Aurora mixed-configuration cluster that contains both Aurora Serverless v2 and a provisioned Aurora cluster, ktown4u has seen significant system improvements in the following areas:

  • Service reliability – The Aurora mixed-configuration cluster architecture can safely respond to traffic through Aurora Serverless v2 instance, even when unexpected global traffic spikes occur before an event.
  • System performance – Stable usage of the database also correlates to the performance of the application. Tests show there is a significant difference in response time to test queries under the initial database overload, averaging 80 milliseconds in the mixed configuration and 220 milliseconds in the non-mixed configuration.
  • Reduced time for auto scaling – The alarm data cycle for metrics with AWS namespaces is at least 1 minute. However, a separate custom metric allows you to reduce the data point frequency to as low as 10 seconds. With this higher resolution metric, auto scaling occurs faster.
  • Cost savings – Ktown4u will spin up read-only instances at least 1 hour in advance to respond to events. Because service reliability is our top priority, the read-only instances are increased by at least 20% more than the predicted traffic. However, some events are ramped up hours in advance because it’s not possible to predict the exact time. According to ktown4u’s estimates, this results in dozens of hours of cost leakage each month. The Aurora mixed-configuration cluster architecture reduces the need for the deployment of unnecessary instances, saving approximately 30% of the event cost.

With this Aurora mixed-configuration architecture, ktown4u was able to quickly restore service reliability through rapid switching in response to unexpected sudden spikes in traffic. Additionally, ktown4u has seen significant improvements in system performance, auto scaling responsiveness, and cost reductions.

Conclusion

Minimizing service interruptions by responding appropriately to database loads will help you prevent loss and keep your business run smoothly.

Amazon Aurora Serverless v2 is supported for Amazon Aurora MySQL-Compatible Edition and PostgreSQL-Compatible Edition. If your Aurora cluster has a lower engine version, refer to Upgrading or converting an existing cluster to use Aurora Serverless v2, or utilize the new blue/green deployment feature to perform database upgrades much safer and faster.

The improved custom auto scaling architecture that utilizes an Aurora hybrid (mixed-configuration) cluster was straightforward and quick to build with various API-based functions and logging provided by AWS. In addition, through multiple architectural improvements, we were able to significantly reduce the operational and management burden by automating the operation based on CloudWatch alarms using CloudWatch logs provided by AWS. In particular, a notable advantage of this architecture is that Aurora Serverless v2 read-only instance can be used for various purposes other than database load response.

We genuinely hope that ktown4u’s case of building an Aurora mixed-configuration cluster will offer great solutions for those facing similar database challenges, cost, and performance.

We have adapted the concepts from this post into a deployable solution, now available as Guidance for Handling Data during Traffic Spikes on AWS in the AWS Solutions Library. To get started, review the architecture diagrams and the corresponding AWS Well-Architected framework, then deploy the sample code to implement the Guidance into your workloads.

We welcome your feedback. If you have any questions or suggestions, leave them in the comments section.


About the Authors

Seung Hak Lee Seung-hak Lee is a DevOps Engineer in charge of architectural design in the AWS environment and system configuration to disseminate and foster DevOps culture with a goal to achieve higher business results for ktown4u.

Hyeong Sik Yang is a Solutions Architect at AWS, Hyeong-sik Yang develops architectures that meet the needs of retail customers based on his in-depth experience in operating various databases and systems, and provides optimal cloud solutions for achieving business goals.