AWS Database Blog
Strategy for Migrating Partitioned Tables from Oracle to Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL Compatibility
Sandeep Kariro is a big data architect and Akm Raziul is a database architect at Amazon Web Services.
Enterprises have been using closed-source database systems for more than four decades. Managing data with these systems has been a critical part of running a successful business in every organization. You have to make a considerable investment up front to obtain the required hardware and enterprise licenses before you can even start building the system. With cloud computing gaining momentum in recent years and technology maturing in the open source system world, interest has been growing in moving critical traditional online transaction processing (OLTP) database workloads to open-source systems such as PostgreSQL, MySQL, and others.
Migrating one database system to another requires significant effort and planning. Even though the basics around the database system generally stay the same, implementation and operational activity on each of them can vary quite a bit.
The AWS Schema Conversion Tool (AWS SCT) helps migrate Oracle schemas and custom stored procedure code to a PostgreSQL-compatible target with minimal effort. AWS SCT generates a migration assessment report that clearly identifies objects and code that require manual conversion or rework. One critical need for high volume OLTP applications is to have the data partitioned for performance reasons. During the initial assessment, AWS SCT highlights the Oracle partitions that require manual intervention. This is due to the way current versions of PostgreSQL (as of 9.6.2) handle table partitions. PostgreSQL manages partitioned tables differently from Oracle, SQL Server, or MySQL database partitions.
In this post, we discuss how to convert Oracle Database native range and list partitions to PostgreSQL native range and list partitions.
Range and list partition definitions in Oracle
Range: Each partition holds a range of values based on the partition key. For example, you can partition data based on a date field. Each partition can hold data for a date range such as a week, a month, a quarter, etc. The first partition of the table is always open-ended toward lower values. You can optionally set the last partition to be open-ended also by defining the maximum partition value as MAXVALUE
. A range partition can have up to 16 columns.
The following is an example of a table created with a range partition on create date. The table is created with three partitions. The first partition holds all data created with a create date earlier than 31-DEC-2016
. The second partition holds all data created with a create date between 01-JAN-2017
and 31-DEC-2017
. The third partition holds all other data created with a create date after 31-DEC-2017
.
List: Each partition holds a set of predefined values that are defined when the table is created. You can create a catch-all DEFAULT
partition to store all the other data that doesn’t fit into the predefined partition definitions. Heap table partitions can have up to 16 columns. Index-organized tables support one-column partitions only.
The following is an example of a table created with a list partition on a gender column. The table is created with three partitions. The first partition holds all data created with gender M
. The second partition holds all data created with gender F. The third partition holds all data not covered by the other two partitions.
DML operations on partitioned tables in Oracle
All Data Manipulation Language (DML) operations are performed on the base table. Oracle automatically adds or deletes the data from the correct partition. Operations can also be explicitly executed at the partition/subpartition level.
Index types supported on partitioned tables on the Oracle
Oracle Database supports indexes on partitioned tables. Indexes can follow the partition model that is inherited from the table definition. Such indexes are called local partitioned indexes. Oracle also supports indexes on the entire table or having its own partitions defined. Such indexes are called global indexes. Global partitioned indexes can be partitioned using range or hash partitioning. Composite partitioning is not supported for global partitioned indexes.
The following are examples of different indexes that are supported by Oracle Database:
Local index:
Global index:
Global partitioned index:
Range and list partitions in PostgreSQL and key differences
PostgreSQL allows table partitioning via table inheritance. Each partition translates to an individual child table, which is inherited from a parent table. The single parent table carries the table definitions with no data, whereas each child table of the parent table applies the partition criteria and its corresponding data.
Similarly, indexing on PostgreSQL partitions is handled through child (partitioned) tables. It doesn’t have the notion of a “global index” like Oracle does. However, its indexes work more closely just like Oracle local partitioned indexes, but with some additional features. For example, the PostgreSQL partition allows you to create indexes on some partitions only. It also lets you delete indexes from some partitions without affecting the indexes, rows, and performance of other partitions. Indexes on each partition are independent in their 1:1 relationship to their corresponding partitions. Since indexes are added explicitly on each partitioned table as needed, additional query optimization can easily be achieved without affecting other tables.
PostgreSQL terminology
For the purposes of this blog, you should be familiar with a few PostgreSQL internal partition terms:
- Parent table: The main table that carries the definition of the table except the partition criteria. The child table inherits the parent table definitions. For partition context, this parent table should not have its own rows. All the rows are inherited by its child tables/partitions.
- Child table: The inherited table from the parent that contains all the data. Child tables inherit the Data Definition Language (DDL) structure from the parent table and apply all the partition criteria with the CHECK These child tables act as partitioned tables.
- Check constraint: One of the column-level constraint types in PostgreSQL that evaluates whether the value of a certain column is true (Boolean).
Partitioning steps
When you’re converting from Oracle range or list partitioned tables to PostgreSQL range or list partitions, you should follow a few basic steps to get all the partition benefits in the PostgreSQL database. Here are the partitioning steps that must be done on the PostgreSQL side during the Oracle partitioned table conversion:
- Create a parent table similar to the Oracle partition table where partition criteria are excluded.
- Create child tables along with the check constraints where Oracle partition criteria are included.
- Make sure that the constraint parameter is set to on/partition.
- Create a procedure function and trigger to handle the inserts on the parent table (optional).
Conversion of range partitions, DML operations, queries, and indexes
The following sections describe in detail creating range partitions in PostgreSQL. We describe additional engine settings that must be turned on for PostgreSQL to do partition pruning during table scans. In addition, we describe all other operations related to partitioned tables.
Step 1: Create a parent table similar to an Oracle partition table where partition criteria are excluded.
Here the main body of the table is created without stating any partition criteria. You use the same Oracle table definition and convert that to a PostgreSQL parent table:
Step 2: Create child tables along with the check constraints where Oracle partition criteria are included.
To check whether the constraints were applied properly:
Step 3: Make sure that the constraint parameter is set to on/partition.
The constraint
parameter is set to partition
by default. This is one of the important parameters for the query planner to improve query performance by examining the constraints. Without that, the partitions are not recognized by the Optimizer engine. Double-check to make sure that this parameter is not set to off.
Unlike Oracle, inserts must be performed on their corresponding child/partition tables. A direct insert to the parent table (without applying step 4) does not inherit all the check constraint conditions. Therefore, those rows aren’t under partition tables and simply stay under the original parent table. In step 4, you create procedures and triggers to handle the direct insert from the parent table. For a quick demonstration, we insert directly to the child tables here:
Insert is successful on the p1 table.
Insert failed as the values are outside the check constraint.
Insert is successful on the p2 table.
Now, if you query the parent table, it reads all the rows from all of its corresponding child/partitioned tables.
If you check the query plan now, it shows that the partition is in use. It scans only the correct partition table rather than scanning all the partition tables.
Now if you temporarily set the Check
constraint parameter (as explained in step 3), you see that the partition is not in use anymore. The simple query ends up scanning all the partition tables.
As discussed previously, PostgreSQL lets you add indexes to the individual partitions to further optimize the row scanning inside the corresponding partitions:
Step 4: Create a procedure function and trigger to handle the inserts on the parent table (optional).
Now you can create a procedure function to make sure that the insert is handled directly from the parent table rather than inserting selectively to the partitioned tables.
Create a procedure to validate constraints during the insert:
Create a trigger on the parent table to call the validation procedure during the inserts:
A direct insert to the parent table is possible now due to the previously created procedure and trigger:
Conversion of list partitions, DML operations, and queries
You also apply all four steps for list partitions.
Step 1. Create a parent table similar to the Oracle partition table where partition criteria are excluded.
Step 2: Create child tables and the check constraints where the Oracle partition criteria are included.
Step 3: Make sure that the constraint parameter is set to on/partition.
Step 4: Create a procedure function and trigger to handle the inserts on the parent table (optional).
Create a procedure to validate constraints during the insert:
Create a trigger on the parent table to call the validation procedure during the inserts:
Now if you temporarily set the Check
constraint parameter (as explained in step 3), you see that the partition is not in use anymore. The simple query ends up scanning all the partition tables:
Other considerations for DML operations on partitioned tables in PostgreSQL
- To insert any new rows, a direct insert to the parent table (person) is good because in step 4, you created functions and triggers to handle direct inserts.
- A direct delete is possible from the parent table without having any intermediary procedures/functions (for example, step 4).
- If there are bulk inserts that fall under the same constraint criteria, we recommend that you bulk insert them directly to the partitioned tables for better performance.
- A direct update is possible as long as it meets the check constraint criteria of the existing child table. Anything outside that criteria requires a new insert and delete:
Summary
In this post, we highlighted the key differences between range and list partitions in source (Oracle) and target (PostgreSQL) systems. We also went over other operational and maintenance operations in both systems and how they differ.