Optimize Amazon Aurora PostgreSQL auto scaling performance with automated cache pre-warming

Amazon Aurora is a relational database service offering performance and high availability at scale. Aurora is compatible with both MySQL and PostgreSQL. You can reuse most of your existing application code and tools with little or no change.

Aurora database (DB) clusters consist of one or more DB instances and a cluster volume that contains the data for those DB instances. This cluster volume is a distributed database storage that spans multiple Availability Zones, with each Availability Zone having a copy of the DB cluster data. Read replicas connect to the same cluster volume that the primary DB instance writes into. As a result, all Aurora replicas return the same data for query results with minimal replica lag. This lag is usually much less than 100 milliseconds after the primary instance has written an update.

The low replication lag allows for patterns such as auto scaling of Aurora replicas with minimal impact to clients from a data consistency perspective. When clients start running queries on new replicas, they will notice a longer runtime for the first few times that queries are run; this is due to the cold cache of the replica. As the database runs more queries, the cache gets populated and the clients notice faster runtimes. Although this is generally not a problem because the database takes care of populating the cache, for latency-sensitive applications, this may introduce unexpected latency spikes, especially if clients are connecting using a load-balanced endpoint.

In this post, we focus on how to address the cold cache so clients that are connecting through a load-balanced endpoint get a consistent experience regardless of whether the replicas are automatically or manually scaled. In addition, we also look at other caching solutions such as Amazon ElastiCache, a fully managed Memcached, Redis, and Valkey compatible service, that can further improve the overall experience for latency-sensitive applications and, in some situations (such as higher cache hits), lead to less frequent auto-scaling events of the Aurora read replicas.

Caching options and considerations

Caching contributes to the overall stability and performance of the application and the database itself by storing frequently used data in-memory for faster retrieval and allowing backend resources to be allocated to serve other requests. Typically, caching can be applied in various layers of an application. This can be applied at a result set level where the backend caches entire database results and this typically uses a general-purpose caching engine such as Memcached, Redis, or Valkey; and database level where the database itself manages its own local cache.

Although for this post we focus on primarily the local database cache, let’s look at how result-set caching and local database caching differs and how they complement each other to provide a better experience for clients that are interacting with the backend.

Local Database Cache

Aurora PostgreSQL and Amazon Relational Database Service (Amazon RDS) for PostgreSQL automatically cache recently queried data blocks into the database’s shared_buffers. The caching that the database does is transparent to the application. As far as the application is concerned, it sends queries to the database and the database manages its local cache. As applications send more queries to the database and the shared_buffers fill up, the database manages the cache eviction and population accordingly depending on the query traffic that it receives.

Since the cache is primarily managed by the database, the following are levers that you can use to adjust and influence what gets stored, and for how long, in the cache:

Increasing shared_buffer size. This would allow more data blocks to be stored in the cache. Consult the AWS documentation for further guidance on this parameter.
Using the pg_prewarm extension. Running the SQL statement:
SELECT pg_prewarm(‘table_name’);
will load the table data into the cache. Since the cache is managed by the database, as more queries are executed, data blocks related to this table may be evicted by the LRU algorithm if those data blocks are not frequently accessed.
Running a SELECT statement. Data blocks that are accessed by the query would be cached; this allows more granular control over which blocks of data would be cached instead of the entire table.

Result Set Caching using Amazon ElastiCache (Memcached or Redis)

With result set caching, the application populates the cache and also decides when and how specific cache entries would be invalidated or overwritten. This gives the application more granular control over the lifecycle of the cache entries. Standard LRU algorithm logic still applies if the memory utilization exceeds what’s available in the ElastiCache cluster based on the sizing configured during provisioning.

In addition to basic key-value structures, caching engines like Redis and Valkey offer enhanced functionality through various data types. These include:

Sets: Unordered collections of unique strings
Sorted Sets: Collections of unique strings ordered by an associated score
Geospatial indexes: Useful for location-based queries within specific geographic areas

These advanced data types enable ElastiCache clusters to support a wider range of use cases beyond simple caching. As a result, application teams can offload more complex operations to ElastiCache, allowing them to focus primarily on implementing core business logic.

Lastly, general purpose caches are an independent component of the architecture, that means multiple applications can connect to the same ElastiCache cluster and use existing cached data for their own purposes.

So how do they complement each other?

While both types of cache achieve the same general idea of improving performance by keeping frequently requested items in memory. The way it’s accomplished and the level of control the application team has differs. From an architectural perspective, they’re not mutually exclusive and both complement each other to achieve consistent low latency performance.

With result set caching, one of the major benefits is a reduction of read traffic hitting the underlying database which can translate to fewer read replicas needed to serve the same or higher traffic. The effectiveness of the result set caching depends on a number of factors such as frequency of updates compared to frequency of reads for the same entry, and the strategies used to keep cache and database in-sync, such as the following:

Lazy loading – updating the database and invalidating the cache entry. The first request for a new or updated entry results in a cache miss, requiring the backend to retrieve the entry from the database and populate the cache, and has higher latency. Subsequent requests for the same item result in a cache hit and the response has lower latency.
Write-through – writing the new/updated entry into both the cache and database at the same time to keep them in sync, all reads result in a cache hit. The downside of this approach is the higher memory consumption of the cache cluster due to all records being cached, and also slightly elevated latency during the write process. The upside is the consistent low latency reads because everything is cached.

In general, if there’s a cache miss (item is not in the cache), the backend needs to go to the database and run the query. This is where the local database cache comes into the picture. If the query hits a freshly provisioned read replica, there is a latency spike because the relevant data blocks are not in the shared buffers and the database has to read from storage to retrieve the data.

Without result set caching, the database serves all read traffic, potentially necessitating more read replicas. This scenario can lead to increased read replica churn due to auto-scaling, especially during traffic spikes. In such cases, pre-warming the local database cache becomes important for maintaining a consistent overall experience. By utilizing techniques like pg_prewarm or strategic SELECT statements, database administrators can ensure that frequently accessed data is readily available in the cache, reducing latency and improving query performance even as the number of replicas fluctuates.

Solution overview

The high-level architecture of the solution looks like the following diagram.

The workflow includes the following steps:

An EventBridge rule invokes the pre-warmer Lambda function when a new DB instance is provisioned. Because the events that are emitted are at the DB instance level, cluster information is not included in the event payload. The pre-warming function will only run if the DB instance in the event is a member of the DB cluster that’s configured with the pre-warmer function.
The pre-warmer function retrieves the database credentials from the configured AWS Secrets Manager secret, connects to the DB instance, and runs pg_prewarm against all the tables that are configured as part of the solution deployment.
After all the tables have been pre-warmed, the function updates the configured custom endpoint to add the new DB instance.

Let’s dive deeper on how to pre-warm the local database cache in Aurora PostgreSQL. At a high-level, there are two ways to populate the cache:

Using the pg_prewarm extension on relevant tables that need to be put in the cache.
Running relevant SELECT statements that are important for your application’s performance. By running SELECT statements, the database reads the relevant data blocks from disk and cache them in memory.

To keep the solution generic, we use pg_prewarm. You configure a list of tables you want pre-warmed before deployment.

When the Lambda function is triggered, it iterates through the list of tables and runs pg_prewarm. Example of the code snippet from the solution is as follows:

for (let itemToPrewarm of arrItemsToPrewarm) 
{
    await pgClient.query(`select pg_prewarm('${itemToPrewarm}')`)
}

If you want more granular control over what’s pre-warmed (instead of entire tables), you can adjust the solution to run a list of SELECT statements and those relevant data blocks would be populated in the database’s shared_buffers.

There are two other considerations when designing the solution:

How to automatically invoke the Lambda function when new replicas are provisioned
How to make sure clients are connecting to only pre-warmed replicas

Amazon EventBridge provides real-time access to changes in data in AWS services, your own application, and software as a service (SaaS) applications without writing code. RDS and Aurora publishes a number of different events, but for our specific use case, we use a DB instance event (specifically RDS-EVENT-0005, when a DB instance is created).

To monitor this event, you need to create an EventBridge rule. A rule defines which events you’re interested in by defining an event pattern and where you’re sending this event to, in this case, a pre-warmer AWS Lambda function. The RDS-EVENT-0005 event has the following example event payload:

{
    "version": "0",
    "id": "831c29a5-5766-073d-d04c-00bc47cb5a5b",
    "detail-type": "RDS DB Instance Event",
    "source": "aws.rds",
    "account": "123456789012",
    "time": "2024-07-04T01:46:31Z",
    "region": "ap-southeast-1",
    "resources": [
        "arn:aws:rds:ap-southeast-1:123456789012:db:aurora-demo-rr2"
    ],
    "detail": {
        "EventCategories": [
            "creation"
        ],
        "SourceType": "DB_INSTANCE",
        "SourceArn": "arn:aws:rds:ap-southeast-1:123456789012:db:aurora-demo-rr2",
        "Date": "2024-07-04T01:46:31.340Z",
        "Message": "DB instance created",
        "SourceIdentifier": "aurora-demo-rr2",
        "EventID": "RDS-EVENT-0005"
    }
}

In our EventBridge rule, we use the following event pattern:

{
    "detail-type": ["RDS DB Instance Event"],
    "source": ["aws.rds"],
    "detail": {
      "EventID": ["RDS-EVENT-0005"]
    }
}

We then target the pre-warmer Lambda function, which receives the event payload. This effectively means that the Lambda function is invoked when a new DB instance gets created. To make sure that we only pre-warm the right DB instance belonging to the right Aurora DB cluster, we validate that the new DB instance is a member of the configured DB cluster before we start the actual pre-warming process.

Now that we have the mechanism to automatically invoke the pre-warming function, let’s look at how to make sure clients are only connecting to pre-warmed replicas.

To group together pre-warmed replicas under one DNS alias, use a custom endpoint. This way, Aurora can perform connection balancing among the instances inside the custom endpoint. This also allows you to control when a new replica is ready to receive traffic as long as the clients are using the provided DNS alias. A custom endpoint needs to be pre-created before installing the solution. This can be done using the AWS Management Console, AWS Command Line Interface (AWS CLI), or SDK. For more information about using the AWS CLI, refer to End-to-end AWS CLI example for custom endpoints.

Prerequisites

The solution uses the AWS Cloud Development Kit (AWS CDK). Refer to the Getting Started Guide to learn more about AWS CDK.

You also need the following:

The Git CLI
The pg_prewarm extension in the target Aurora PostgreSQL DB cluster, which you can load using CREATE EXTENSION IF NOT EXISTS “pg_prewarm”.
Pre-created custom endpoint. To guide you on the creation of the custom endpoint, Amazon Aurora’s documentation provides an end-to-end AWS CLI example.
Docker daemon should be installed and running. This is used by the AWS CDK when bundling (compiling Typescript and packaging dependencies) the AWS Lambda function that does the actual pre-warming.
Aurora PostgreSQL database credentials stored in AWS Secrets Manager.

Deploy the solution

Before you deploy the solution, you can perform a runtime test to compare with the same test post-deployment. We discuss the steps for testing the solution in the next section. Complete the following steps to deploy the solution:

Clone the GitHub repository.
Run npm install in the terminal.
Open the context.json file at the root of the project folder and populate the following parameters:
- vpcId: the VPC ID where the target Amazon Aurora cluster is running in.
- clusterIdentifier: the cluster identifier of the target Amazon Aurora cluster.
- customEndpointIdentifier: the identifier of the pre-created custom endpoint. For example, if the custom endpoint is prewarmer.cluster-custom-cbu3gzzt45sd.ap-southeast-1.rds.amazonaws.com, then the identifier is prewarmer.
- dbSecretArn: the AWS Secrets Manager ARN for the relevant credentials to login to the Amazon Aurora database. The pre-warmer Lambda function uses the credentials stored in the configured Secret to avoid having to hard-code credentials. In addition, the credentials are never emitted or logged anywhere and only used to connect to the target Aurora PostgreSQL read replica.
- itemsToPrewarm: the list of tables that would be pre-warmed. This can either be just the “tableName” or “schemaName.tableName”.
- dbSecurityGroupId: the security group that’s associated with the Amazon Aurora database. This security group will be modified to allow connectivity from the pre-warmer Lambda function’s security group.
- database: the name of the database to connect to.
- prewarmerSubnetIds: the list of Subnet IDs that the pre-warmer Lambda function will be deployed in. Minimum of 1 Subnet ID. The subnet route should have access to either a NAT Gateway or VPC Endpoint for RDS and Secrets Manager.
Run cdk deploy (make sure you have the Docker daemon running in your local machine or where you’re deploying from) in the terminal. You should see a breakdown of what’s going to be deployed. Confirm and wait for the deployment to finish.

The solution automatically runs the Lambda function when a new read replica is deployed in the Aurora DB cluster that you configured the solution with.

Test the solution

You can run a quick test before and after the solution has been deployed. If you have specific data and a query that you want to test, you can skip the sample data generation step.

If you need test data to sample the effectivity of the pre-warmer, you can use the following SQL statements to generate 60 million records:

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
create table sample_table(id character varying(100), quantity integer, price numeric, encrypted_name character varying(100));
insert into sample_table select uuid_generate_v4(), random()*10, random()*100, md5(random()::text) from generate_series(1, 10000000);
insert into sample_table select uuid_generate_v4(), random()*10, random()*100, md5(random()::text) from generate_series(1, 10000000);
insert into sample_table select uuid_generate_v4(), random()*10, random()*100, md5(random()::text) from generate_series(1, 10000000);
insert into sample_table select uuid_generate_v4(), random()*10, random()*100, md5(random()::text) from generate_series(1, 10000000);
insert into sample_table select uuid_generate_v4(), random()*10, random()*100, md5(random()::text) from generate_series(1, 10000000);
insert into sample_table select uuid_generate_v4(), random()*10, random()*100, md5(random()::text) from generate_series(1, 10000000);

To compare the runtime using psql, make sure to enable runtime output using \timing.

Before the solution has been deployed, provision a fresh read replica, connect to it using psql, enable \timing, and run the following query (using the test data that you generated earlier):

select avg(price) from sample_table where quantity>5

In our testing using db.r5.large, the resulting runtime is 10.729 seconds.

After you deploy the pre-warming solution, provision a fresh read replica and rerun the test. We use the same db.r5.large, and the resulting runtime is 4.198 seconds.

Solution Scaling

For environments that have a large amount of database deployments or Aurora clusters with large amount of read replicas, this solution would only be triggered whenever there’s a new DB instance that’s created. So depending on the environment configuration and how frequently new DB instances or new read replicas are created, users may need to review their Lambda service quotas to make sure they have adequate concurrent executions. This is especially important if Lambda is also being used for something else within the same AWS account (such as a backend for an API; or for other event-based use cases).

Cost Considerations

There are 2 major components to the solution, these are as follows:

Amazon EventBridge (pricing page)
AWS Lambda (pricing page)

Since the events that we’re interested in is emitted by the RDS service itself into the default event bus, there’s no additional cost. In addition, EventBridge doesn’t charge extra for rules or event delivery.

For AWS Lambda, the CDK configuration of the pre-warmer Lambda Function uses 256MB of memory. The number of invocations would largely depend on the amount of new DB instances that are being created in the AWS account, and the duration of each invocation would depend on 2 factors, these are as follows:

Whether the new DB instance is a read replica and is part of the Aurora PostgreSQL cluster that is configured in the solution. If it’s not, the Lambda function would exit right away and this generally takes a short time.
The number of tables to pre-warm.

Clean up

To avoid future charges, you can either delete the AWS CloudFormation stack using the AWS CloudFormation console, or run the following command in your terminal:

cdk destroy

This deletes all the provisioned resources related to the solution. If you provisioned an Amazon Aurora cluster just to test the solution, refer to the Deleting Aurora DB clusters and DB instances documentation to guide you on the clean-up to avoid future charges related to testing the solution.

Conclusion

In this post, we covered how to improve query performance of a new replica by warming up the cache using pg_prewarm. In addition, we shared how to invoke the automation by integrating Aurora with EventBridge, which allows you to receive certain events and act on specific events using EventBridge rules and event patterns. Lastly, we showed how to use custom endpoints to control the set of DB instances the client can connect to.

You can find the complete solution on GitHub. Try it out for yourself, and leave your feedback in the comments.

About the Author

Jan Michael Go Tan is a Principal Solutions Architect for Amazon Web Services. He helps customers design scalable and innovative solutions with the AWS Cloud.

AWS Database Blog