Strategy for Migrating Partitioned Tables from Oracle to Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL Compatibility

Sandeep Kariro is a big data architect and Akm Raziul is a database architect at Amazon Web Services.

Enterprises have been using closed-source database systems for more than four decades. Managing data with these systems has been a critical part of running a successful business in every organization. You have to make a considerable investment up front to obtain the required hardware and enterprise licenses before you can even start building the system. With cloud computing gaining momentum in recent years and technology maturing in the open source system world, interest has been growing in moving critical traditional online transaction processing (OLTP) database workloads to open-source systems such as PostgreSQL, MySQL, and others.

Migrating one database system to another requires significant effort and planning. Even though the basics around the database system generally stay the same, implementation and operational activity on each of them can vary quite a bit.

The AWS Schema Conversion Tool (AWS SCT) helps migrate Oracle schemas and custom stored procedure code to a PostgreSQL-compatible target with minimal effort. AWS SCT generates a migration assessment report that clearly identifies objects and code that require manual conversion or rework. One critical need for high volume OLTP applications is to have the data partitioned for performance reasons. During the initial assessment, AWS SCT highlights the Oracle partitions that require manual intervention. This is due to the way current versions of PostgreSQL (as of 9.6.2) handle table partitions. PostgreSQL manages partitioned tables differently from Oracle, SQL Server, or MySQL database partitions.

In this post, we discuss how to convert Oracle Database native range and list partitions to PostgreSQL native range and list partitions.

Range and list partition definitions in Oracle
Range: Each partition holds a range of values based on the partition key. For example, you can partition data based on a date field. Each partition can hold data for a date range such as a week, a month, a quarter, etc. The first partition of the table is always open-ended toward lower values. You can optionally set the last partition to be open-ended also by defining the maximum partition value as MAXVALUE. A range partition can have up to 16 columns.

The following is an example of a table created with a range partition on create date. The table is created with three partitions. The first partition holds all data created with a create date earlier than 31-DEC-2016. The second partition holds all data created with a create date between 01-JAN-2017 and 31-DEC-2017. The third partition holds all other data created with a create date after 31-DEC-2017.

CREATE TABLE person 
	(id NUMBER(10) NOT NULL
	,name VARCHAR2(50)
	,gender VARCHAR2(1)
	,create_dt DATE DEFAULT sysdate
	)
PARTITION BY RANGE(create_dt) (
  partition p1 values less than ('01-JAN-2017'), 
  partition p2 values less than ('01-JAN-2018'), 
  partition p3 values less than (MAXVALUE)
);

List: Each partition holds a set of predefined values that are defined when the table is created. You can create a catch-all DEFAULT partition to store all the other data that doesn’t fit into the predefined partition definitions. Heap table partitions can have up to 16 columns. Index-organized tables support one-column partitions only.

The following is an example of a table created with a list partition on a gender column. The table is created with three partitions. The first partition holds all data created with gender M. The second partition holds all data created with gender F. The third partition holds all data not covered by the other two partitions.

CREATE TABLE person 
	(id NUMBER(10) NOT NULL
	,name VARCHAR2(50)
	,gender VARCHAR2(1)
	,create_dt DATE DEFAULT sysdate
	)
PARTITION BY LIST(gender) (
  partition M values('M'), 
  partition F values('F'), 
  partition O values(DEFAULT)
);

DML operations on partitioned tables in Oracle

All Data Manipulation Language (DML) operations are performed on the base table. Oracle automatically adds or deletes the data from the correct partition. Operations can also be explicitly executed at the partition/subpartition level.

Index types supported on partitioned tables on the Oracle

Oracle Database supports indexes on partitioned tables. Indexes can follow the partition model that is inherited from the table definition. Such indexes are called local partitioned indexes. Oracle also supports indexes on the entire table or having its own partitions defined. Such indexes are called global indexes. Global partitioned indexes can be partitioned using range or hash partitioning. Composite partitioning is not supported for global partitioned indexes.

The following are examples of different indexes that are supported by Oracle Database:

Local index:

CREATE INDEX person_local_idx ON person (id) LOCAL;

Global index:

CREATE INDEX person_global_idx ON person(id);

Global partitioned index:

CREATE INDEX person_global_part_idx ON person(id)
GLOBAL PARTITION BY RANGE(id)
(PARTITION p1 VALUES LESS THAN(500000),
 PARTITION p2 VALUES LESS THAN(MAXVALUE));

Range and list partitions in PostgreSQL and key differences

PostgreSQL allows table partitioning via table inheritance. Each partition translates to an individual child table, which is inherited from a parent table. The single parent table carries the table definitions with no data, whereas each child table of the parent table applies the partition criteria and its corresponding data.

Similarly, indexing on PostgreSQL partitions is handled through child (partitioned) tables. It doesn’t have the notion of a “global index” like Oracle does. However, its indexes work more closely just like Oracle local partitioned indexes, but with some additional features. For example, the PostgreSQL partition allows you to create indexes on some partitions only. It also lets you delete indexes from some partitions without affecting the indexes, rows, and performance of other partitions. Indexes on each partition are independent in their 1:1 relationship to their corresponding partitions. Since indexes are added explicitly on each partitioned table as needed, additional query optimization can easily be achieved without affecting other tables.

PostgreSQL terminology

For the purposes of this blog, you should be familiar with a few PostgreSQL internal partition terms:

Parent table: The main table that carries the definition of the table except the partition criteria. The child table inherits the parent table definitions. For partition context, this parent table should not have its own rows. All the rows are inherited by its child tables/partitions.
Child table: The inherited table from the parent that contains all the data. Child tables inherit the Data Definition Language (DDL) structure from the parent table and apply all the partition criteria with the CHECK These child tables act as partitioned tables.
Check constraint: One of the column-level constraint types in PostgreSQL that evaluates whether the value of a certain column is true (Boolean).

Partitioning steps

When you’re converting from Oracle range or list partitioned tables to PostgreSQL range or list partitions, you should follow a few basic steps to get all the partition benefits in the PostgreSQL database. Here are the partitioning steps that must be done on the PostgreSQL side during the Oracle partitioned table conversion:

Create a parent table similar to the Oracle partition table where partition criteria are excluded.
Create child tables along with the check constraints where Oracle partition criteria are included.
Make sure that the constraint parameter is set to on/partition.
Create a procedure function and trigger to handle the inserts on the parent table (optional).

Conversion of range partitions, DML operations, queries, and indexes

The following sections describe in detail creating range partitions in PostgreSQL. We describe additional engine settings that must be turned on for PostgreSQL to do partition pruning during table scans. In addition, we describe all other operations related to partitioned tables.

Step 1: Create a parent table similar to an Oracle partition table where partition criteria are excluded.

Here the main body of the table is created without stating any partition criteria. You use the same Oracle table definition and convert that to a PostgreSQL parent table:

CREATE TABLE person 
(id NUMERIC(10) NOT NULL
,name VARCHAR(50)
,gender VARCHAR(1)
,create_dt DATE DEFAULT current_date
);

Step 2: Create child tables along with the check constraints where Oracle partition criteria are included.

Create table  p1 (check ("create_dt" <'2017-01-01')) inherits ("person");
Create table  p2 (check ("create_dt" <'2018-01-01')) inherits ("person");
Create table  p3 (check ("create_dt" >='2018-01-01' )) inherits ("person");

To check whether the constraints were applied properly:

post=> select inhrelid::regclass, inhparent::regclass  from pg_inherits where inhparent='person'::regclass;
 inhrelid | inhparent
----------+-----------
 p3       | person
 p1       | person
 p2       | person

Step 3: Make sure that the constraint parameter is set to on/partition.

The constraint parameter is set to partition by default. This is one of the important parameters for the query planner to improve query performance by examining the constraints. Without that, the partitions are not recognized by the Optimizer engine. Double-check to make sure that this parameter is not set to off.

post=> select name,setting from pg_settings where name ='constraint_exclusion';
         name         |  setting
----------------------+-----------
 constraint_exclusion | partition

Unlike Oracle, inserts must be performed on their corresponding child/partition tables. A direct insert to the parent table (without applying step 4) does not inherit all the check constraint conditions. Therefore, those rows aren’t under partition tables and simply stay under the original parent table. In step 4, you create procedures and triggers to handle the direct insert from the parent table. For a quick demonstration, we insert directly to the child tables here:

post=> insert into p1 values (1,'john','M','2015-04-05');
INSERT 0 1

Insert is successful on the p1 table.

post=> insert into p1 values (1,'john','M','2016-04-05');
ERROR:  new row for relation "p1" violates check constraint "p1_create_dt_check"
DETAIL:  Failing row contains (1, john, M, 2016-04-05).

Insert failed as the values are outside the check constraint.

post=> insert into p2 values (1,'john','M','2016-04-05');
INSERT 0 1

Insert is successful on the p2 table.

post=> insert into p2 values (2,'john','M','2016-04-05'),(3,'ashley','F','2016-03-05');
INSERT 0 2

Now, if you query the parent table, it reads all the rows from all of its corresponding child/partitioned tables.

post=> select *from person;
 id |  name  | gender | create_dt
----+--------+--------+------------
  1 | john   | M      | 2015-04-05
  1 | john   | M      | 2016-04-05
  2 | john   | M      | 2016-04-05
  3 | ashley | F      | 2016-03-05
(4 rows)
 
post=> select *from person where create_dt='2016-03-05';
 id |  name  | gender | create_dt
----+--------+--------+------------
  3 | ashley | F      | 2016-03-05
(1 row)

If you check the query plan now, it shows that the partition is in use. It scans only the correct partition table rather than scanning all the partition tables.

post=> explain select *from person where create_dt='2016-03-05';
                          QUERY PLAN
--------------------------------------------------------------
 Append  (cost=0.00..17.32 rows=3 width=146)
   ->  Seq Scan on person  (cost=0.00..1.58 rows=1 width=146)
         Filter: (create_dt = '2016-03-05'::date)
   ->  Seq Scan on p2  (cost=0.00..15.75 rows=2 width=146)
         Filter: (create_dt = '2016-03-05'::date)
(5 rows)

Now if you temporarily set the Check constraint parameter (as explained in step 3), you see that the partition is not in use anymore. The simple query ends up scanning all the partition tables.

post=> set constraint_exclusion =off;
SET

post=> explain select *from person where create_dt='2016-03-05';
                          QUERY PLAN
--------------------------------------------------------------
 Append  (cost=0.00..48.83 rows=7 width=146)
   ->  Seq Scan on person  (cost=0.00..1.58 rows=1 width=146)
         Filter: (create_dt = '2016-03-05'::date)
   ->  Seq Scan on p3  (cost=0.00..15.75 rows=2 width=146)
         Filter: (create_dt = '2016-03-05'::date)
   ->  Seq Scan on p1  (cost=0.00..15.75 rows=2 width=146)
         Filter: (create_dt = '2016-03-05'::date)
   ->  Seq Scan on p2  (cost=0.00..15.75 rows=2 width=146)
         Filter: (create_dt = '2016-03-05'::date)

As discussed previously, PostgreSQL lets you add indexes to the individual partitions to further optimize the row scanning inside the corresponding partitions:

Create index p1_create_dt_idx on p1(create_dt);
Create index p2_create_dt_idx on p2(create_dt);
Create index p3_create_dt_idx on p3(create_dt);

Step 4: Create a procedure function and trigger to handle the inserts on the parent table (optional).

Now you can create a procedure function to make sure that the insert is handled directly from the parent table rather than inserting selectively to the partitioned tables.

Create a procedure to validate constraints during the insert:

CREATE OR REPLACE FUNCTION person_trigger()
RETURNS TRIGGER AS $$
BEGIN
  IF (NEW."create_dt"< '2017-01-01' ) THEN
    INSERT INTO p1 VALUES (NEW.*);
  ELSIF (NEW."create_dt"< '2018-01-01' ) THEN
    INSERT INTO p2 VALUES (NEW.*);
  ELSIF (NEW."create_dt">= '2018-01-01') THEN
    INSERT INTO p3 VALUES (NEW.*);
  ELSE
    RAISE EXCEPTION ' Person table Date is not according to the set range. Please fix the person_trigger() function.!';
  END IF;
  RETURN NULL;
END;
$$ 
LANGUAGE plpgsql;

Create a trigger on the parent table to call the validation procedure during the inserts:

create trigger person_insert
    before insert on person
    for each row execute procedure person_trigger();

A direct insert to the parent table is possible now due to the previously created procedure and trigger:

post=> insert into person values (1,'Raziu','M','2016-04-05');
INSERT 0 0

post=> select *from person;
 id |  name  | gender | create_dt
----+--------+--------+------------
  1 | john   | M      | 2015-04-05
  1 | john   | M      | 2016-04-05
  2 | john   | M      | 2016-04-05
  3 | ashley | F      | 2016-03-05
 1  | Raziul |  M     | 2016-04-05
(4 rows)

Conversion of list partitions, DML operations, and queries

You also apply all four steps for list partitions.

Step 1. Create a parent table similar to the Oracle partition table where partition criteria are excluded.

CREATE TABLE personList 
(id NUMERIC(10) NOT NULL
,name VARCHAR(50)
,gender VARCHAR(1)
,create_dt DATE DEFAULT current_date
);

Step 2: Create child tables and the check constraints where the Oracle partition criteria are included.

Create table  M (check ("gender" = 'M')) inherits ("personlist");
Create table  F (check ("gender" = 'F')) inherits ("personlist");
Create table O (check ("gender" <> 'M'  and "gender" <> 'F' )) inherits ("personlist");

Step 3: Make sure that the constraint parameter is set to on/partition.

post=> select name,setting from pg_settings where name ='constraint_exclusion';
         name         |  setting
----------------------+-----------
 constraint_exclusion | partition

Step 4: Create a procedure function and trigger to handle the inserts on the parent table (optional).

Create a procedure to validate constraints during the insert:

CREATE OR REPLACE FUNCTION personlist_trigger()
RETURNS TRIGGER AS $$
BEGIN
  IF (NEW."gender"= 'M' ) THEN
    INSERT INTO M VALUES (NEW.*);
  ELSIF (NEW."gender"='F') THEN
    INSERT INTO F VALUES (NEW.*);
      ELSE
INSERT INTO O VALUES (NEW.*);
  END IF;
  RETURN NULL;
END;
$$ 
LANGUAGE plpgsql;

Create a trigger on the parent table to call the validation procedure during the inserts:

create trigger personlist_insert
    before insert on personlist
    for each row execute procedure personlist_trigger();
 
post=> insert into personlist values (1,'ashley','F','2018-04-05');
INSERT 0 0

post=> insert into personlist  values (1,'john','M','2018-04-05');
INSERT 0 0

post=> insert into personlist  values (1,'Others','O','2018-04-05');
INSERT 0 0

post=> select *from M;
 id | name | gender | create_dt
----+------+--------+------------
  1 | john | M      | 2018-04-05
(1 row)
 
post=> select *from F;
 id |  name  | gender | create_dt
----+--------+--------+------------
  1 | ashley | F      | 2018-04-05
(1 row)
 

post=> select *from O;
 id |  name  | gender | create_dt
----+--------+--------+------------
  1 | Others | O      | 2018-04-05
(1 row)
 
post=> select *from personlist;
 id |  name  | gender | create_dt
----+--------+--------+------------
  1 | john   | M      | 2018-04-05
  1 | ashley | F      | 2018-04-05
  1 | Others | O      | 2018-04-05
(3 rows)

post=> set constraint_exclusion =on;
SET

post=> explain select *from personlist where gender='F';
                            QUERY PLAN
------------------------------------------------------------------
 Append  (cost=0.00..17.32 rows=3 width=146)
   ->  Seq Scan on personlist  (cost=0.00..1.58 rows=1 width=146)
         Filter: ((gender)::text = 'F'::text)
   ->  Seq Scan on f  (cost=0.00..15.75 rows=2 width=146)
         Filter: ((gender)::text = 'F'::text)
(5 rows)

Now if you temporarily set the Check constraint parameter (as explained in step 3), you see that the partition is not in use anymore. The simple query ends up scanning all the partition tables:

post=> set constraint_exclusion =off;
SET


post=> explain select *from personlist where gender='F';
                            QUERY PLAN
------------------------------------------------------------------
 Append  (cost=0.00..48.83 rows=7 width=146)
   ->  Seq Scan on personlist  (cost=0.00..1.58 rows=1 width=146)
         Filter: ((gender)::text = 'F'::text)
   ->  Seq Scan on m  (cost=0.00..15.75 rows=2 width=146)
         Filter: ((gender)::text = 'F'::text)
   ->  Seq Scan on f  (cost=0.00..15.75 rows=2 width=146)
         Filter: ((gender)::text = 'F'::text)
   ->  Seq Scan on o  (cost=0.00..15.75 rows=2 width=146)
         Filter: ((gender)::text = 'F'::text)
(9 rows)

Other considerations for DML operations on partitioned tables in PostgreSQL

To insert any new rows, a direct insert to the parent table (person) is good because in step 4, you created functions and triggers to handle direct inserts.
A direct delete is possible from the parent table without having any intermediary procedures/functions (for example, step 4).
If there are bulk inserts that fall under the same constraint criteria, we recommend that you bulk insert them directly to the partitioned tables for better performance.
A direct update is possible as long as it meets the check constraint criteria of the existing child table. Anything outside that criteria requires a new insert and delete:

post=> update person set create_dt='2015-04-05' where id=1;
ERROR:  new row for relation "p3" violates check constraint "p3_create_dt_check"
DETAIL:  Failing row contains (1, john, M, 2015-04-05).

Summary

In this post, we highlighted the key differences between range and list partitions in source (Oracle) and target (PostgreSQL) systems. We also went over other operational and maintenance operations in both systems and how they differ.

AWS Database Blog

Strategy for Migrating Partitioned Tables from Oracle to Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL Compatibility

Resources

Blog Topics

Follow