AWS Database Blog
How to use ProxySQL with open source platforms to split SQL reads and writes on Amazon Aurora clusters
The blog post How to set up a single pgpool endpoint for reads and writes with Amazon Aurora PostgreSQL introduces an architecture that uses the read and write split capabilities of Amazon Aurora PostgreSQL endpoints. This type of architecture works great for Aurora PostgreSQL clusters, but what if you are using Amazon Aurora MySQL clusters for your database? This post complements the original one by introducing ProxySQL as an intermediate layer for achieving read and write splits with Aurora MySQL endpoints.
Amazon Aurora provides endpoints for the primary DB instance (the cluster endpoint) and for Read Replicas (the reader endpoint). Aurora updates the cluster endpoint automatically so that it always points to the primary instance. It uses the reader endpoint to do DNS round-robin for the connections for read operations across all available Read Replicas. Additionally, you can use Amazon Aurora custom endpoints to further segregate the traffic for your application.
Amazon Aurora Replicas typically have less than 100 ms of replication lag. So, if your application can tolerate that lag, it can use both the cluster and reader endpoints to take advantage of a horizontally scaled database, as shown following.
The preceding diagram shows the current architecture with the application deciding which endpoint to use. However, managing two database endpoints, one for reads and one for writes, adds complexity to the application. Some third-party drivers support a narrower set of use cases, but you can apply the ideas in this post more broadly to read and write split use cases.
In this post, we show how to use ProxySQL to provide a single MySQL-compatible Aurora endpoint that automatically directs write traffic to the cluster endpoint and read traffic to the reader endpoint. The following diagram shows the proposed solution based on ProxySQL middleware.
Architecture
ProxySQL is a GNU General Public License (GPL)–licensed MySQL proxy middleware that sits between a MySQL database and the database clients. In this example, we deploy it using the architecture in the following figure.
This diagram shows how you can use ProxySQL as middleware to provide a single endpoint for a MySQL-compatible Aurora cluster. The Aurora cluster consists of one primary instance and two Aurora Read Replicas that are deployed across two Availability Zones and two private subnets. The cluster has a security group that allows incoming data only from the ProxySQL instances.
ProxySQL is deployed behind a Network Load Balancer in an Auto Scaling group with a single active instance (for demonstration purposes in this post) for failover purposes. You can change this configuration by updating the master.yaml
file (under Resources.ProxySQLStack.Properties.Parameters
), as shown following:
This configuration change provisions two ProxySQL instances for regular load, and it can go up to four instances during peak loads. Also, this improves the availability of the database if there is a disruption to one of the ProxySQL nodes.
ProxySQL is also isolated in private subnets with a security group that permits access only from allowed Classless Inter-Domain Routing (CIDR) blocks.
The access subnets host a Network Load Balancer that provides a consistent endpoint for ProxySQL. So, your database endpoint doesn’t change if the ProxySQL instance fails and the Auto Scaling group creates a new one.
Deploying with AWS CloudFormation
The AWS CloudFormation templates that are used in this example are located in this GitHub repository. The example uses several nested AWS CloudFormation templates to deploy the virtual private cloud (VPC) infrastructure, the security groups, the Aurora cluster, and the ProxySQL middleware. Using nested stacks lets you break up a large stack into several reusable components. If you aren’t familiar with AWS CloudFormation, review the AWS CloudFormation documentation.
For complete instructions on deploying the templates, see the README file on GitHub. The following sections contain some of the highlights.
Creating the Amazon Aurora cluster
The following AWS CloudFormation snippets show how to create a three-node Amazon Aurora cluster when the VPC infrastructure and security groups are ready. Amazon Aurora sets one node as the primary and the other two nodes as Read Replicas.
Deploying ProxySQL
In the AWS CloudFormation template that deploys ProxySQL, you set up an ELB load balancer and an Auto Scaling group. The launch configuration for the Auto Scaling group deploys and configures ProxySQL using the AWS CloudFormation cfn-init tool.
In this solution, we use the Amazon Linux AMI for each Amazon EC2 instance. The Amazon Linux AMI includes packages and configurations that provide seamless integration with AWS. The repositories are available in all AWS Regions, and you can access them using yum. Hosting repositories in each Region allows you to deploy updates quickly and without any data transfer charges. If you want more information about updating your EC2 instance software, see Updating Instance Software in the Amazon Elastic Compute Cloud User Guide.
First, you need to install the MySQL client:
Next, download the ProxySQL repo:
Then, install the repo:
Finally, start ProxySQL:
Configuring ProxySQL
You can do most of the ProxySQL configuration using the MySQL client by connecting locally using admin credentials. The default user name and password for ProxySQL are as follows:
‘admin’ and ‘admin’
We strongly recommended that you change these credentials after the initial configuration.
The following command configures initial read and write split options in ProxySQL:
Although the options contained in the proxysql.sql
file are good for a quick split of the read and write traffic, you can configure more sophisticated query routing rules in ProxySQL. The following provides an explanation of the contents of the proxysql.sql
file.
First, you ensure that the mysql_servers
and mysql_replication_hostgroups
tables are clean for your config. The mysql_servers
table contains entries for all the backend endpoints that are to be proxied by ProxySQL. The mysql_replication_hostgroups
table contains the configuration for automatic identification and segregation of read-only endpoints.
Before version 2, ProxySQL checked for the “read_only
” variable to determine reader endpoints from writer endpoints, whereas Amazon Aurora bases its definition of readers versus writers based on the “innodb_read_only
” variable. ProxySQL version 2.0 has added support for custom specification for variable definition using the check_type
column on the mysql_replication_hotgroups
table.
Next, you configure your endpoints in the mysql_servers
table:
Then, you configure the mysql_replication_hostgroups
table to define the writer and reader host groups using the check_type
value of “innodb_read_only
”:
Here, you write the configuration back to the disk so that it persists if the ProxySQL service is restarted:
Next, you ensure that the mysql_query_rules
table is clean for your config:
Then, you define the query routing rules for your needs. You can define your rules to support a variety of use cases including sharding, digest-based, or read/write split-based. This example shows a basic read/write split scenario configuration:
The previous two rules configure for all write traffic to go into the writer host group and all the read traffic to go to the reader.
Again, you write the configuration back to the disk so that it persists if the ProxySQL service is restarted:
Next, you ensure that the mysql_users
table is clean for your config:
Next, you define your backend users in the ProxySQL config:
Again, you write the configuration back to the disk so that it persists if the ProxySQL service is restarted:
Next, you configure the monitoring user in ProxySQL. This user is used for backend monitoring and needs Usage and Replication Client privileges:
Then, you configure the Backend MySQL version and tweak some monitoring intervals:
Again, you write the configuration back to the disk so that it persists if the ProxySQL service is restarted:
You should provide the following variables to the proxysql.sql
file for the setup:
Variable | Value | Notes |
ClusterEndpoint | Aurora cluster endpoint | This is also the writer host for ProxySQL. |
ReaderEndpoint | Aurora reader endpoint | This is also the reader host for ProxySQL. |
DatabaseUser | Aurora database user | This is also used as a monitoring user for your cluster. You can replace it with a custom user. If you create a custom user for monitoring, be sure to update the proxysql.sql config file. |
DatabasePassword | Aurora database password | This is also used as the password for the monitoring user for your cluster. If you create a custom user for monitoring, be sure to update the corresponding password in the proxysql.sql config file. |
DatabaseName | Aurora database name | |
BackendMySQLVersion | The Amazon Aurora database MySQL version | This is the version returned by ProxySQL back to your consuming application. Ensure that this is set to the same version as for the backend. The default setting for ProxySQL version 2.0 is 5.5.30. |
Also note that the default port and interface that ProxySQL provides for client connections are as follows:
And following is the default setting for the administration of ProxySQL:
Testing the configuration
Now you can issue a few SQL statements and confirm that ProxySQL is directing traffic as expected.
Start by creating a table and inserting a few rows:
Checking in ProxySQL stats schema
, you can see that these statements are directed to the Amazon Aurora primary host group (host group 10):
hostgroup | schemaname | username | digest | digest_text | count_star |
10 | proxysqlexample | proxysqluser | 0xCF52DCD38A9B9942 | CREATE TABLE Persons ( PersonID int, LastName varchar(?), FirstName varchar(?), Address varchar(?), City varchar(?) ) | 1 |
10 | proxysqlexample | proxysqluser | 0xE7B5E8C714313F56 | INSERT INTO Persons (PersonID, LastName, FirstName, Address, City) VALUES (?, ?, ?, ?, ?) | 3 |
Next, issue a query:
Checking in ProxySQL stats schema
, you can see that this statement is directed to the Amazon Aurora reader endpoint (host group 20):
hostgroup | schemaname | username | digest | digest_text | count_star |
10 | proxysqlexample | proxysqluser | 0xCF52DCD38A9B9942 | CREATE TABLE Persons ( PersonID int, LastName varchar(?), FirstName varchar(?), Address varchar(?), City varchar(?) ) | 1 |
10 | proxysqlexample | proxysqluser | 0xE7B5E8C714313F56 | INSERT INTO Persons (PersonID, LastName, FirstName, Address, City) VALUES (?, ?, ?, ?, ?) | 3 |
20 | proxysqlexample | proxysqluser | 0xDEB542EDC426A35F | select * from Persons | 1 |
As a final test, confirm that ProxySQL handles SELECT statements that are embedded in a transaction:
These statements should all hit the primary node as the embedded SELECT is querying rows that are written in the transaction. You can again see from the log that things are working as expected.
hostgroup | schemaname | username | digest | digest_text | count_star |
10 | proxysqlexample | proxysqluser | 0xCF52DCD38A9B9942 | CREATE TABLE Persons ( PersonID int, LastName varchar(?), FirstName varchar(?), Address varchar(?), City varchar(?) ) | 1 |
10 | proxysqlexample | proxysqluser | 0xE7B5E8C714313F56 | INSERT INTO Persons (PersonID, LastName, FirstName, Address, City) VALUES (?, ?, ?, ?, ?) | 5 |
20 | proxysqlexample | proxysqluser | 0xDEB542EDC426A35F | select * from Persons | 1 |
10 | proxysqlexample | proxysqluser | 0xFAD1519E4760CBDE | BEGIN | 1 |
10 | proxysqlexample | proxysqluser | 0x379FAD0823D3FF7B | SELECT * from Persons where PersonID > ? | 1 |
We recommend that you do a full regression test with your own database clients before using this configuration in production. If the scalability of the ProxySQL middleware concerns you, consider using a clustered ProxySQL instance, where the Auto Scaling group allows multiple instances and scales based on load.
Summary: Striving for simplicity
In this post, you learned how to use ProxySQL to provide a single endpoint for an Amazon Aurora cluster that automatically directs read traffic to the reader endpoint. You can use this technique to simplify how your application handles connections to Amazon Aurora.
If you need more advanced routing options, such as the ability to route database connections based on custom logic, you can explore the query routing capabilities of ProxySQL in further detail. With custom logic, you can handle more advanced routing, such as sending a small percentage of queries to a newer schema for canary testing or weight-based routing.
In the future, keep an eye on Amazon Aurora Serverless, which scales the database backend automatically without requiring the configuration of replicas.
About the Authors
Tulsi Garg is a Senior Solutions Architect for AWS Worldwide Public Sector Canada Team. He works with Canadian Public Sector customers (Government, Non Profit Organizations, Startups and Educational Institutes) to help accelerate their journey to the cloud. He is passionate about IoT and has been found to use his daughters as an excuse to buy tech toys.
Diego Magalhães is a AWS Senior Solutions Architect – World Wide Public Sector, working with Education customers throughout Canada. Always looking for new challenges, the latest one was trading the sunny weather in Brazil for the Toronto cold winters. ☃️