AWS Database Blog
Reduce network transfer time with connection compression in Amazon RDS for MySQL and Amazon RDS for MariaDB
Our customers have seen many cases where network bandwidth between the MySQL client and database becomes the source of bottleneck. This leads to increase in the query latencies causing performance impact. This is especially seen for workloads which have high network bandwidth requirement. MySQL and MariaDB solve this problem by allowing you to compress traffic between the client and the database server.
In this post, we discuss how to use connection compression with Amazon Relational Database Service (Amazon RDS) for MySQL and Amazon RDS for MariaDB, and explore their benefits with a test case.
MySQL Client connection compression options
In version 8.0.18, MySQL introduced support for the protocol_compression_algorithms
system variable, which lists the compression algorithms that the database server permits for incoming connections. With Amazon RDS for MySQL, this parameter is set to zlib,zstd,uncompressed
, which means that by default, we allow three types of connections: zlib compressed, zstd compressed, and uncompressed.
While using the MySQL client, the --compression-algorithm
command line option allows the client to list the desired compression algorithms while connecting to the database instance. This value defaults to uncompressed.
The --compress
command line option that was available with earlier versions of MySQL client tools has been deprecated as of MySQL 8.0.18. With MySQL client versions prior to 8.0.18, the client supports only the zlib compression algorithm, in addition to uncompressed connections. You cannot specify a compression algorithm or the compression level with this option, so we recommend upgrading to the latest client driver to take advantage of compression.
To set up the connection, both the client and the database server must agree on a mutually permitted compression algorithm. Since Amazon RDS for MySQL supports all 3 compression options, the client can choose any algorithm. Therefore, if you specify multiple options with --compression-algorithms
, the first compression algorithm supported by the database server gets picked. For example, if --compression-algorithms=zlib,zstd
, then the zlib compression algorithm is picked. If the database server doesn’t support the zlib compression algorithm (which isn’t the case with Amazon RDS for MySQL), then the zstd compression algorithm is picked.
Zstd has proven to be more efficient algorithm as compared to ZLib in terms of compression speed, decompression speed, and compression ratio. You may evaluate both compression algorithms to see which one is optimal for your workload.
If you’re using the zstd compression algorithm to establish connections, you also have the flexibility to choose the compression level. The --zstd-compression-level
command line option for the MySQL client allows you to configure the zstd compression level for the connection. This defaults to 3. The permitted levels are 1–22.
For additional reading refer to the presentation on ZStandard in ZFS which compares ZSTD with other compression algorithms and also compares the ZSTD compression levels
Although higher values for --zstd-compression-level
provide a greater degree of compression, this also leads to increased utilization of the compute resources required to compress and decompress the result set packets, thereby reducing query performance. We must need to carefully choose the compression level to strike a balance between compressing the network traffic and utilizing compute resources.
Configure your compression algorithm and compression level
To get the most out of this feature, it’s important to choose the right compression algorithm and compression level for your workload. One way to achieve this is to benchmark the performance of your workloads using different compression algorithms(zstd, zlib, and uncompressed) and in the case of zstd, different compression levels which can range from 1-22.
In this section, we present a few examples using the MySQL client to establish and monitor client connections.
Connect to the RDS instance using the MySQL client and compressed connections with the following command:
There are several methods to check if your connection to the MySQL instance is compressed.
First, you can use the status(\s)
command on the MySQL command line. Look for the Protocol field, which shows the value of Compressed:
Note: The protocol version field provides information on the version of TLS used and should not be confused with the ZSTD compression level.
You can also check status variables like Compression
, Compression_algorithm
, and Compression_level
, to check the compression status of your current connection.
If performance schema is enabled, you can also check if compression is turned on for other connections from the application using the following query. You can use this query to get the processlist output and determine whether compression is turned on or off.
Connection Compression with Amazon RDS for MariaDB
The MariaDB Client also supports connection compression. The --compress
parameter passed to MariaDB client forces the use of compression in server/client protocol. You cannot specify the compression algorithm or compression levels with the --compress
argument.
You can use the status(\s)
command on the command line and look for the Protocol field, which shows the value of Compressed:
You can also check the status variable Compression
, to get the compression status of your current connection.
Performance benchmarking
Let’s understand the utility of connection compression with an example.
We create an Amazon Elastic Compute Cloud (Amazon EC2) instance and install the MySQL Client. We also create an Amazon RDS for MySQL database instance with the following configuration.
- Client – Amazon EC2 t4g.2xlarge instance, 8 VCPUs, 32 GB RAM
- Database server – Amazon RDS for MySQL 8.0.31 with a db.r6g.2xlarge instance, 8 VCPUs, 64 GB RAM
Note that using burstable instance classes like t2/t3/t4g as the client, can starve on CPU when the CPU credits are exhausted.
We use a single select query with a full table scan for the simplicity of demonstration.
Create a test table using sysbench.
Now use sysbench to insert 20,000,000 rows into the table:
Now that we have populated the table, we start running a query with a full table scan, which scans all the rows and returns all the rows in its result set back to the client.
After connecting using the MySQL client, you can filter the result set from being displayed on the terminal by using the pager command at the MySQL client:
We run a select on a table with full table scan.
We measure the Network Transfer Speed using Enhanced Monitoring feature available on RDS. In the Enhanced monitoring OS metrics, we add the network RX
and TX
to get the total data transferred. You can also use the nload tool to measure network bandwidth on the EC2 instance where MySQL Client is installed. Our results are as follows:
ZSTD Compression Level | Max Network bandwidth Utilization(tx+rx) (MBps) | Time to Execute Query and Return Result Set(seconds) |
0 | 231 | 17.25 |
3 | 28 | 72.11 |
7 | 16 | 126.55 |
10 | 11 | 182.45 |
In this experiment, we see no performance gains by using compressed connections. The network bandwidth utilization reduces when the compression level is increased, but we also see an increase in the time to run the query and return the result set. The query completion time increased along with the increase in the zstd compression levels. This increase in execution time is due to the additional overheads required to compress the result set. Now let’s use a table with larger row size:
We populated random data in to the table, as shown in the following queries, instead of using sysbench.
We run a select on the table with a full table scan.
Our results are as follows:
ZSTD Compression Level | Max Network bandwidth Utilization(tx+rx) (MBps) | Time to Execute Query and Return Result Set(seconds) |
0 | 594 | 32.43 |
3 | 1.7 | 11.01 |
7 | 0.26 | 69.92 |
10 | 0.23 | 72.64 |
In this experiment, we used the repeat function to load data into the table. Due to the redundant nature of the table data, the compression algorithm was able to efficiently compress the data. We see that the network bandwidth reduced considerably with compressed connections. It reduced from 594 MBps to 1.7 MBps with a zstd compression level of 3. This also reduced the query completion time. However, when the compression level was further increased to 7, the query took more time to complete.
Conclusion
In this post, we discussed how connection compression is helpful for certain kinds of workloads on Amazon RDS for MySQL and Amazon RDS for MariaDB where network bandwidth is a constraint and result sets are large. We also showed you how the degree of compression depends on the nature of the data set.
You can see the benefit in cases where network transfer time is larger compared to the cost of compute resources required for compression and decompression operations of the connection payload.
In order to check if connection compression is the right fit for your workload, we suggest you perform benchmark tests.
More details on Connection Compression with MySQL are available in the MySQL Reference Manual. Add a comment to share with us how useful is connection compression for your workload.
About the Author
Mershad Irani is a Database Engineer (DBE) at Amazon Web Services.