AWS Database Blog
Determining the optimal value for shared_buffers using the pg_buffercache extension in PostgreSQL
In OLTP databases, the buffer cache is memory allocated for storing cached data and dirty pages. Cached data speeds up the retrieval of frequently accessed information, and reading from these cached pages minimizes disk I/O operations. Setting appropriate values for shared_buffers
is important for optimal PostgreSQL performance and can lead to significant reductions in overall database operational costs. In this post, we discuss the importance of the shared_buffers
configuration and guide you on determining its optimal value using the pg_buffercache
extension. The optimization strategies outlined in this post are applicable to PostgreSQL, regardless of whether it’s self-managed or hosted on Amazon Aurora PostgreSQL-Compatible Edition or Amazon Relational Database Service (Amazon RDS) for PostgreSQL.
Overview of shared_buffers
The global memory area in PostgreSQL mainly consists of shared_buffers
, wal_buffers
, and CLOG buffers. The shared_buffers
memory area is used to store cached data and dirty pages. A dirty page is a written or modified data page in shared_buffers
as the result of a write operation against index or table data. These dirty pages are flushed to the disk by a background writer process when the number of clean shared_buffers
appears to be insufficient. The objective of this process is to ensure that the data pages are moved to permanent storage and free space is available for use in shared_buffers
. Furthermore, checkpoints are automatically issued in the background based on checkpoint settings writing all dirty pages to disk to create a restore point in the event of a crash. Amazon Aurora PostgreSQL uses log records for higher availability instead of checkpoints for crash recovery. The following diagram illustrates the memory components of PostgreSQL.
The default configuration of shared_buffers
varies depending on the PostgreSQL deployment. In the community version of PostgreSQL, the default value for shared_buffers
is set at 128 MB. However, this value might be lower if your kernel settings are not supportive, as determined during the initdb
process. For Amazon RDS for PostgreSQL, the default is calculated using the formula DBInstanceClassMemory/32768
. Both community PostgreSQL and Amazon RDS for PostgreSQL lean heavily on the operating system for caching, making an allocation of 30–35% of memory to shared_buffers
ideal. Conversely, in Amazon Aurora PostgreSQL, the default value is derived from the formula SUM(DBInstanceClassMemory/12038, -50003)
. This difference stems from the fact that Amazon Aurora PostgreSQL does not depend on the operating system for data caching. As a result, shared_buffers
allocation in Amazon Aurora PostgreSQL is notably higher compared to that in Amazon RDS for PostgreSQL.
How to set shared_buffers
In both Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL, you can establish a custom parameter group to alter any parameter configuration. The shared_buffers
setting is adjusted through this custom parameter group. Notably, shared_buffers
is a static parameter, meaning any modifications require a reboot of the database instance for the changes to take effect. This parameter is defined as a number of 8 kB blocks. To determine the value of shared_buffers
for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL within the RDS parameter group, employ the following formula:
shared_buffers
= (Target percentage x Total RAM in GB x 1024 x 1024) /(100 x 8)
For example, if you’re running an RDS instance on the db.m6g.12xlarge instance class (192 GB of RAM), a value of 10,066,329 on the shared_buffers
parameter will set 40% of total RAM:
(40 x 192 x 1024 x 1024) /(100 x 8)
To set the shared_buffers
value in your parameter group, complete the following steps:
- On the Amazon RDS console, choose Databases in the navigation pane.
- Navigate to your database and in the Instance section, choose the link for your associated parameter group.
- In the Modifiable parameters section, search for and modify the
shared_buffers
parameter in the custom parameter group using the preceding formula.
Applying the new value of shared_buffers
requires an instance reboot.
shared_buffers and I/O utilization
In both Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL, for read-intensive workloads, the engine tries to fulfill the query request using pages already in the buffer cache. If the required pages are absent from the cache, the PostgreSQL database engine initiates reads against the storage layer to fetch them. When workloads are served entirely from the cache, there’s no read I/O consumption. However, when they aren’t, I/O charges apply for the pages sourced from storage. For example, if the engine scans 8 data pages to retrieve 10 tuples, it registers 8 read I/O operations.
Amazon CloudWatch has the metric BufferCacheHitRatio
, which displays the percentage of requests catered by the buffer cache. A higher hit ratio implies reduced read I/O. Therefore, to curtail read I/O, shared_buffers
should be expansive enough to house the majority of the working dataset. For a deeper understanding of I/O characteristics in Amazon RDS for PostgreSQL, refer to I/O size and volume throughput limits.
Benefits of analyzing shared_buffers
shared_buffers
in PostgreSQL plays a critical role in database performance. By storing data and indexes in memory, shared_buffers
can significantly reduce the number of disk I/O operations. This results in faster query performance and reduces overall database cost. shared_buffers
uses the least recently used (LRU) algorithm for cached pages. The LRU caching algorithm removes the least recently used data pages when the cache is full, and a new page is referenced that is not in the cache.
When shared_buffers
is too small, it could result in frequent reads from storage, and the pages in the buffer may have to be constantly flushed to and from the disk. This affects performance and lowers query concurrency. However, setting the value of shared buffers too high can result in increased memory pressure on the system, which can lead to swapping and decreased performance. Factors to consider when tuning the shared_buffers
setting include the amount of memory available on the system, the size of the working dataset, and the workloads on the database. Analyzing the shared_buffers
setting and making appropriate adjustments ensures that the PostgreSQL database is performing optimally and providing the best possible result for workloads.
Use AWS monitoring tools to determine shared_buffers modification
In Amazon Aurora PostgreSQL, the BufferCacheHitRatio
CloudWatch metric can help you determine whether to modify shared_buffers
or not. If BufferCacheHitRatio
is below 90%, consider increasing shared_buffers
to serve more data from cache. In Amazon RDS for PostgreSQL, the hit ratio can be identified by querying the pg_statio_user_tables
and pg_statio_user_indexes
system catalogs. The following query identifies the hit ratio on tables:
The following query identifies hit ratio on indexes:
To finalize your shared_buffers
modification, in addition to CloudWatch metrics, you can use Amazon RDS Performance Insights. Performance Insights is a database performance tuning and monitoring feature that helps you quickly assess workloads on your database and determine next steps. Counter metrics such as blks_hits
, blks_read
, tup_fetched
, and tup_returned
can be used to better estimate shared_buffers
. Performance Insights offers 7 days of performance data history at no charge. For pricing details, refer to Performance Insights Pricing. The following screenshot shows these counters for an Aurora PostgreSQL instance.
Analyze shared_buffers using the pg_buffercache extension
After analyzing the hit ratio, you can use the pg_buffercache
extension to examine the contents of the shared buffer cache in real time. The pg_buffercache
extension provides useful information about the buffer cache and helps you analyze the performance of a PostgreSQL database. pg_buffercache
functions provide information about the number and size of buffers in the cache, the number of hits and misses, and the state of individual buffers. This information can be used to help diagnose performance problems and optimize database performance.
The pg_buffercache
result set provides one row for every buffer in the shared cache. If a buffer is unused, all its fields will appear as null, except for the bufferid
. A high number of cache misses can suggest that the shared_buffers
value might be set too low, causing the database to engage in more disk I/O operations than required. If this is observed, adjust the shared_buffers
value upward, which in turn can enlarge the buffer cache and decrease cache misses.
Let’s look at how to use pg_buffercache
to analyze the buffer cache.
Note: All SQL queries are tested on PostgreSQL 15. You may have to modify the queries for older PostgreSQL versions.
- Connect to your Aurora PostgreSQL or RDS for PostgreSQL database using your preferred SQL client.
- Run the following SQL statement to install the
pg_buffercache
extension: - Run the following query to show the sorted relations based on the buffer cache used:
The following screenshot shows the output.
The preceding query provides the number of buffers used by each relation of the current database. Now, we run a sample workload on a sample database and analyze pg_buffercache
.
- Run the following query to identify the top 10 relations residing in
shared_buffers
and percentage utilization:
The output columns are as follows:
- buffer_size – The size of buffer related to the relation
- relation_size – The human-readable relation size
- buffer_percentage – The percentage of
shared_buffers
occupied by the relation - relation_percentage – The percentage of relations that reside in
shared_buffers
- relation_type – The type of relation, such as table, index, or sequence
The following screenshot shows our output.
In this test scenario, shared_buffers
is set at 1.95 GiB. The pgbench_accounts
table is 139 GiB in size. 52.50% of shared_buffers
is occupied by 1 GiB of this table. The rest of shared_buffers
(.95 GiB) is used by other relations. If the working dataset of this table is larger than the cached data, to host more data for caching, shared_buffers
should be modified to a higher value. While modifying the shared_buffers
value, it’s imperative to monitor FreeableMemory
, because other process-based memory allocations such work_mem
and maintenance_work_mem
also share memory from the same memory pool. If there isn’t sufficient space to increase the shared_buffers
value, then it’s advisable to scale up the instance class to secure a larger total RAM allocation.
shared_buffers and pg_prewarm
For business-critical read workloads, having data preloaded in shared_buffers
enhances performance. Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL support the pg_prewarm
extension, which provides a convenient way to load relational data into the cache. In the preceding example, if you determine that pgbench_accounts_1
is a business-critical table and publishing a report is scheduled on the table, you can preload the data by running pg_prewarm
for this table, as shown in the following code. Note that prewarming data pushes out the shared_buffers
content based on an LRU basis, which can impact other read workloads.
Conclusion
In this post, we explored the concept of shared_buffers
and the pg_buffercache
extension for Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL. The appropriate value for shared_buffers
directly impacts the performance of PostgreSQL. Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL support the pg_buffercache
extension, which provides a convenient way to get information about the buffer cache of a PostgreSQL database. CloudWatch metrics and Performance Insights counters help determine the workloads served by cached data. By understanding the state of the buffer cache, you can identify areas for performance optimization and fine-tune the configuration of PostgreSQL databases.
If you have any questions or suggestions about this post, leave a comment.
About the Authors
Kiran Singh is a Senior Partner Solutions Architect and an Amazon RDS and Amazon Aurora specialist at AWS focusing on relational databases. She helps customers and partners build highly optimized, scalable, and secure solutions; modernize their architectures; and migrate their database workloads to AWS.
Vivek Singh is a Principal Database Specialist Technical Account Manager with AWS focusing on Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL engines. He works with enterprise customers providing technical assistance on PostgreSQL operational performance and sharing database best practices. He has over 17 years of experience in open-source database solutions, and enjoys working with customers to help design, deploy, and optimize relational database workloads on AWS.