AWS Database Blog
Migrate Oracle bulk binds to Amazon Aurora PostgreSQL-Compatible Edition or Amazon RDS for PostgreSQL
To migrate an Oracle database to Amazon Aurora PostgreSQL-Compatible Edition, you usually need to perform both automated and manual tasks. The automated tasks include schema conversion and data migration, which can be handled with the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS), respectively. The manual tasks involve post-schema AWS SCT migration touch-ups for certain database objects that can’t be migrated automatically.
The AWS SCT automatically converts the source database schema and a majority of the custom code to a format compatible with the target database. In a database migration from Oracle to PostgreSQL, the AWS SCT automates the conversion of Oracle PL/SQL code to equivalent PL/pgSQL code in PostgreSQL.
When migrating from Oracle to PostgreSQL, it’s very common to come across bulk binds (such as BULK COLLECT and FORALL). A bulk bind is a feature in Oracle databases that allows multiple rows of data to be fetched or processed at once, rather than one row at a time. It works by binding a collection of values to a single variable, which is then processed by the database in a single operation.
In this post, we provide a brief overview of how Oracle bulk binds work, and show you how to migrate them to a database running on Amazon Aurora PostgreSQL-Compatible Edition. We also review some challenges you may encounter during the conversion and how you can resolve them.
Solution overview
Oracle PL/SQL allows you to fetch records in bulk rather than one by one. BULK COLLECT reduces context switches between the SQL and PL/SQL engine and allows the SQL engine to fetch the records all at once. The advantage of using BULK COLLECT is that it increases performance by reducing the interaction between the database and PL/SQL engine.
PostgreSQL provides the array function ARRAY_AGG, which you can use to have a similar processing logic to Oracle. In this post, we discuss different approaches of using BULK COLLECT and how to migrate the same into PostgreSQL. We also discuss common mistakes and solutions while using ARRAY_AGG as an alternative to BULK COLLECT.
Prerequisites
To get started with the solutions described in this post, you need following:
- An active AWS account
- A source Oracle database (on premises or Amazon RDS for Oracle)
- A target Aurora PostgreSQL database
- A database user with the following privileges on the target database (replace
YOUR_SCHEMA_NAME
andYOUR_DB_USER
with your schema and user, respectively):
Use the following code to create the sample tables in Oracle and PostgreSQL and insert data into them.
The following is the Oracle code:
The following is the PostgreSQL code:
The following sections demonstrate how BULK COLLECT works in Oracle and how to migrate functions using BULK COLLECT into PostgreSQL.
BULK COLLECT using FETCH
The FETCH statement is used to retrieve a single row of data from a cursor. When used with BULK COLLECT, the FETCH statement retrieves multiple rows of data at once.
The following PL/SQL program declares a cursor and an associate array, and demonstrates how to use BULK COLLECT to retrieve all rows in a single FETCH:
We get the following output.
Now let’s convert the Oracle example to the equivalent PL/pgSQL code.
PL/pgSQL in PostgreSQL doesn’t currently have the BULK COLLECT syntax. However, we can use the ARRAY_AGG aggregate function to run queries that perform bulk processing.
The first step is to declare an array of a type that matches the Oracle collection type. To do that, you have two options:
- Create a domain in PostgreSQL that is a type similar to the collection type in Oracle (for example,
CREATE DOMAIN l_tab AS CHARACTER VARYING[]
) - Directly declare an array variable in PL/pgSQL code (for example,
l_tab CHARACTER VARYING[]
)
In the following example, we use ARRAY_AGG
to fetch c_data
cursor data into the l_tab
variable to mimic the bulk processing functionality of BULK COLLECT in Oracle by directly declaring an array variable in PL/pgSQL code:
The example produces the following output.
It’s recommended to convert Oracle global objects to domains in PostgreSQL. This is because Oracle global objects are accessible publicly across all the procedures and functions inside the package. Similarly, in PostgreSQL, we create a domain to replicate the similar functionality as an Oracle global array.
BULK COLLECT in a SELECT statement
The previous section demonstrated how to migrate PL/SQL code using the BULK COLLECT feature with FETCH to its equivalent constructs in PL/pgSQL. Now let’s explore the different aspects of the BULK COLLECT mechanism and the steps required to migrate using a SELECT statement.
The following is an example of an Oracle function that uses BULK COLLECT in a SELECT statement:
The example produces the following output.
In PostgreSQL, we can use the ARRAY_AGG
function to achieve the bulk SELECT INTO statement functionality:
The example produces the following output.
In PostgreSQL, to achieve similar functionality of the BULK COLLECT statement, we used the ARRAY_AGG
function to aggregate the result of the query into an array. The result of the query is then stored in the l_tab
variable, which is an array of test_table
records.
BULK COLLECT with LIMIT
When using BULK COLLECT, you can also specify a LIMIT clause, which indicates the maximum number of rows to be fetched at once. This can be useful for managing memory usage and preventing the collection variable from becoming too large.
The following is an example of an Oracle function that uses BULK COLLECT with LIMIT:
The example produces the following output.
In PostgreSQL, we don’t have a LIMIT clause while processing multiple rows using an array, but as a solution we can achieve similar functionality by using LIMIT and OFFSET:
The example produces the following output.
BULK COLLECT with FORALL
The FORALL statement allows you to perform a single DML operation on all elements of a collection, rather than processing each element one at a time. When used with BULK COLLECT, it can significantly reduce the amount of context switches between the SQL engine and the PL/SQL engine, which can speed up the run process.
The following example shows how to use BULK COLLECT with FORALL with Oracle:
We get the following output.
In PostgreSQL, we can achieve a similar functionality of DML operations on data in bulk using UNNEST. This loads the complete data from the array variable into the table with a single SELECT statement.
We use the table test_table2 for BULK COLLECT with FORALL:
We are trying to achieve the BULK COLLECT with FORALL functionality in PostgreSQL using the UNNEST option:
The example produces the following output.
Debugging common errors
This section reviews common errors that you may encounter when using ARRAY_AGG
as an alternative to BULK_LOAD
and how you can resolve them.
We use the following sample source Oracle code to illustrate common errors we may face while refactoring the code to PostgreSQL.
The following is the Oracle code:
The following sections show PostgreSQL code.
ARRAY_AGG on a single column
This section focuses on common issues while using ARRAY_AGG
on a single column of a table.
In our first example, we use a single column with the row operator in ARRAY_AGG
:
We get the following error when using a row operator on a single column along with ARRAY_AGG
.
In the preceding output, the value fetched from the array is a row instead of an integer, which is why we get an error.
To resolve this issue, while working with ARRAY_AGG
on a single column, we shouldn’t convert the integer into a row. The solution is to use only that column in the ARRAY_AGG
expression for which we want to return the list of arrays.
The following is the correct approach of using ARRAY_AGG
on a single column:
We get the following output.
ARRAY_AGG on multiple columns
You may also run into issues when using ARRAY_AGG
on multiple columns. For example, see the following code:
The example produces the following output.
In this example, we used individual column names with the ARRAY_AGG
function. As a result, we get an error because the function ARRAY_AGG
(integer, text) doesn’t exist. To resolve this error, let’s discuss two approaches while using multiple columns in ARRAY_AGG
functions.
In our first approach, we use a row operator on top of ARRAY_AGG
, which first converts the columns into rows. Then the ARRAY_AGG()
function loads the same into an arr variable, which is an array of the test2 table type. See the following code:
The example produces the following output.
In our second approach, let’s say we have n number of columns for which we have to do a multi-column insert. In this scenario, in spite of specifying all columns, we can use the table alias with the * operator:
We get the following output.
Conclusion
In this post, we shared step-by-step instructions for migrating Oracle bulk binds to Amazon Aurora PostgreSQL-Compatible Edition or Amazon RDS for PostgreSQL, and provided code templates, SQL scripts, and best practices. We also discussed common mistakes and solutions while using ARRAY_AGG
as an alternative to BULK COLLECT.
For more information about arrays and types in PostgreSQL, refer to Arrays and Aggregate Functions.
If you have any questions or suggestions about this post, leave a comment.
About the Authors
Vinay Paladi is a Lead Database Consultant with the Professional Services team at Amazon Web Services. He works as a database migration specialist, helping customers build highly available, cost-effective database solutions and migrate their commercial engines to the AWS Cloud. He is passionate about building innovative solutions to accelerate the database journey to the cloud.
Bikash Chandra Rout is a Lead Database Consultant with the Professional Services team at Amazon Web Services. Bikash focuses on helping customers build highly available, cost-effective database solutions and migrate their commercial engines to the AWS Cloud. He is passionate about databases and analytics.