AWS Database Blog
Migration tips for developers converting Oracle and SQL Server code to PostgreSQL
PostgreSQL is one of the most popular open-source relational database systems. It is considered to be one of the top database choices when customers migrate from commercial databases such as Oracle and Microsoft SQL Server. AWS provides two managed PostgreSQL options: Amazon RDS and Amazon Aurora.
In addition to providing managed PostgreSQL services, AWS also provides tools and resources to help with migration. AWS Schema Conversion Tool (SCT) is a free AWS tool that helps you convert your existing schemas and supports several source and target databases. AWS also provides AWS Database Migration Service (DMS), which helps transfer and continuously replicate data between heterogeneous and homogenous databases. Similarly, AWS provides migration playbooks that document a large number of functional mappings between commercial databases and open-source databases such as PostgreSQL.
This post provides tips and best practices for converting code from PL/SQL to PL/pgSQL, which can help achieve better performance and code conversions to PostgreSQL. This post is targeted for developers working on database migrations and assumes that the readers have a basic knowledge of databases and PL/SQL.
Performance considerations
This section provides some of the factors that influence PostgreSQL performance improvements while migrating from commercial or legacy databases such as SQL Server or Oracle. Most of the databases have similar objects, but considering the right object, changes the behavior of the system. This section explains how to achieve better performance with stored procedures, functions, and SQL statements.
Data types
To avoid re-work, correctly map the data types in the target database to the source system before starting the project. The following table summarizes some common data type mapping from Oracle and SQL Server to PostgreSQL.
Oracle | PostgreSQL | SQL Server | Notes |
Number | Small Integer | Tinyint / Smallint | Generally for lookup tables whose values of the table are limited. |
Number | Integer / Bigint | Integer / Bigint | |
Number | Double Precision / Float / Numeric |
Double Precision / Float / Numeric |
For the financial domain in which you want an application to store high precision value, you can configure it as numeric/decimal. Otherwise, double precision or float may be sufficient. |
Varchar | Char(n) Varchar(n) Varchar Text Character varying |
Nchar Nvarchar Ntext |
|
Timestamp(6) | Timestamp without timezone | DateTime2(p) DateTime |
|
Clob | Text | ||
Blob | Raw | ||
Bytea | Binary, Image, VarBinary | ||
Boolean | Boolean | Bit | |
XML | XML | XML |
Why number to smallint/integer/bigint and not numeric?
To get the best performance from your database, it is important to use optimal data types.
If your table column must hold a maximum of a four-digit number, the column data type with 2 (smallint) bytes is sufficient, rather than defining 4 (integer/real), 8 (bigint/double precision), or variable (numeric) byte data types.
Numeric is a complex type that can hold 131,000 digits and is recommended for storing monetary amounts and other quantities for which exactness is required. However, calculations on numeric values are very slow compared to the integer types or floating-point types, because its operators are slow.
The following table gives an example of how the size of a table grows for a single column when you compare numeric size with smallint/int/bigint for non-precision columns, excluding indexes.
TN | Size | External size | Value inserted |
numericsize | 16 KB | 8192 bytes | Insert into numericsize value (1234678) |
smallintsize | 8192 bytes | 0 bytes | Insert into numericsize value (1234) |
intsize | 8192 bytes | 0 bytes | Insert into numericsize value (123457) |
bigintsize | 8192 bytes | 0 bytes | Insert into numericsize value (123486) |
The following table uses the same information as the previous table, but includes indexes. For this table, size refers to the total size of the table, and external size is the size of related objects such as indexes.
TN | Size | External size |
numericsize | 32 KB | 24 KB |
smallintsize | 24 KB | 16 KB |
intsize | 24 KB | 16 KB |
bigintsize | 24 KB | 16 KB |
AWS SCT maps number to numeric data type for tables without knowing the actual data size. This tools have an option to configure/map to right data type while conversion.
Procedures and functions
PostgreSQL 10 and older versions do not have procedures support. All the procedures and functions from Oracle and SQL Server are mapped to functions in PostgreSQL. The procedures are supported in PostgreSQL, starting from version 11, and are similar to Oracle.
PostgreSQL supports three volatility categories of functions and you must use the appropriate type based on the functionality while migrating: Volatile
, Stable
, and Immutable
. Marking the function type appropriately could be an important performance tweak.
Volatile
The Volatile
type indicates that the function value can change even within a single table scan, so that no optimizations can be made. Relatively few database functions are volatile; some examples are random()
, currval()
, and timeofday()
. Any function that has side effects must be classified as volatile, even if its result is predictable, to prevent calls from being optimized away, one example is setval()
. If the volatility type is not provided during function creation, all new functions are marked as volatile by default.
Below is a sample function to show the time taken to execute the Volatile
function.
Stable
The Stable
type indicates that the function cannot modify the database. It also indicates that within a single table scan, it consistently returns the same result for the same argument values, but that its result could change across SQL statements. This is the appropriate selection for functions whose results depend on database lookups or parameter variables, such as the current time zone. The current_timestamp
family of functions qualifies as stable, because their values do not change within a transaction.
Below is a sample function to show the time taken to execute the Stable
function.
Execute the function below to see the cost of the function.
Immutable
The Immutable
type indicates that the function cannot modify the database and always returns the same result when given the same argument values. This means it does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.
Below is the sample function to show the time taken to execute the Immutable
function.
Execute the function below to see the cost of the function.
All of these functions return the following value:
Though all of the above functions deliver the same value, you may need to use any of these three function types, depending on the functionality, to achieve better performance.
The test run of each of these functions shows that the functions contain the same functionality, but the Immutable
variant takes the minimum amount of time. This is because this category allows the optimizer to pre-evaluate the function during the query calls with constant arguments.
Function calls in views and queries
Many applications use views and queries that contain function calls. As discussed in the previous section, in PostgreSQL, this can be a costly operation, especially if the function volatility category is not set correctly. In addition to this, the function call itself adds to the query cost.
Choose the appropriate volatility for your function based on what the function does. If your function truly is Immutable
or Stable
, setting it instead of using the default of Volatile
could give you some performance advantages.
The following example code is a query with the Volatile
function call.
The function getDeptname()
is marked as volatile. The total runtime for the query is 2 seconds and 886 milliseconds.
The following example code is a query with the Stable
function call.
The following example code replaces the function call with functionality.
Optimizing exceptions
PostgreSQL provides the functionality to trap and raise errors using the Exception
and Raise
statements. This is a useful functionality, but it comes at a cost.
Raise
statements raise errors and exceptions during a PL/pgSQL function’s operation. By default, any error occurrence inside a PL/pgSQL function causes the function to abort the execution and roll back the changes. To recover from errors, PL/pgSQL can trap the errors using the Exception
clause. For this functionality, PostgreSQL has to save the state of the transaction before entering the block of code with exception handling. This is an expensive operation, so it adds an overhead cost.
To avoid this overhead, it is recommended to either have the exceptions catching at the application side, or make sure that the required validation is in place so that the function never causes an exception.
The following code example demonstrates the performance impact of having an exception in a function call.
If you can’t verify without an exception, the exception is clearly required. In the preceding example, you can check the diagnostics to see if there is a change that was taken care of. It is a good practice to avoid using exception handling if possible.
Counter not required for fetch operation
Many applications get the total count and loop through the cursor to fetch the records. Because the fetch operation returns null
when there are no records, it is better to use the fetch status rather than looping through the count by declaring another two variables and checking the count. You can avoid declaring extra variables and checking incremental values by reducing the number of statements to execute and achieve better performance. See the following code as an example.
Check with EXISTS rather than count
In legacy applications, SQL queries are written to find the count of records that match, and then applies the required business logic. If table has billions of records, then getting the record count can be costly.
The code sample below demonstrates how to check the count of rows and then update the data.
This code can also be re-written to check for one column rather an entire row, which is can be more cost and performance effective. See the sample code below.
The total runtime of the query is 104 milliseconds.
Record count after DML statements
In most of the legacy applications, the record count indicates if there are any changes for the data manipulation statements. In PostgreSQL, this information is maintained in its statistics and can be retrieved to avoid the count of the values after the operation. Use diagnostics to retrieve the number of rows affected, as shown in the code sample below.
Pattern match and search
It’s common practice to use the wildcard character %
or _
with the LIKE
(or ILIKE
for insensitive searches) expression while retrieving data from tables. If the wildcard character is at the start of the given pattern, the query planner can’t use an index even if an index exists. In this case, a sequential scan is used, which is a time-consuming operation. To get better performance with millions of records and make the query planner use the available indexes, use the wildcard character either in the middle or at the end rather than in the beginning of the predicate. This forces the planner to use indexes.
In addition to the LIKE
expression, you can also use the pg_trgm
module/extension for pattern matching. The pg_trgm
module provides functions and operators that you can use to determine the similarity of alphanumeric text. It also provides index operator classes that support fast searching for similar strings. For more information, see the pg_trgm documentation on the PostgreSQL website.
Conversion mapping between Oracle, SQL Server, and PostgreSQL
This section provides database specific comparisons while writing SQL statements across Oracle, SQL Server, and PostgreSQL databases.
Default FROM clause
For Oracle, the FROM
clause is mandatory, in which case you would use the code Select 1 from Dual;
. For PostgreSQL and SQL, it is optional to use the code Select 1;
.
Generating a series of values
You can generate a series of values from the start to the end number.
In Oracle, you don’t need a starting number, but can give an end number. See the following code as an example.
In SQL Server, use the following code.
Join with (+) operator
In Oracle, for a left join, use the following code.
For a right join, use the following code.
For more information, see SQL for Beginners (Part 5): Joins on the Oracle database site.
There is no feature called “+” in PostgreSQL or SQL Server to do a left or right join to the tables. Instead, use the following two queries.
Type as a parameter to functions
In SQL Server, you can pass multiple records with the Type
data type. To implement the same in PostgreSQL, you can use it as a JSON or text data type in JSON format or array. The following example code is with text data type in JSON format with multiple records. You can insert it into a temporary table and process it further with the following code.
Oracle
The following code shows how multiple records can be passed in the varchar data type in Oracle.
SQL Server
The following code shows how multiple records can be passed in table type in SQL Server for the same functionality given above in Oracle.
PostgreSQL
The following code shows how multiple records can be passed in as text type in PostgreSQL for the same functionality given above in Oracle and SQL Server.
Converting pivoting
In PostgreSQL, the pivoting functionality is not enabled and requires an extension. The extension tablefunc
enables the crosstab
function, which you use creating pivot tables, similar to SQL Server and Oracle. The following is the pivoting functionality code in Oracle, SQL Server, and PostgreSQL.
Oracle
Implement the pivoting functionality in Oracle with the following code.
SQL Server
Implement the pivoting functionality in SQL Server with the following code.
PostgreSQL
Create the extension for PostgreSQL with the following code.
Unpivoting to an array
There is no Unpivot
function available in PostgreSQL. When converting from SQL Server or Oracle to PostgreSQL, the unpivot is mapped to an array. See the following code for an example.
Oracle
Implement the unpivoting functionality in Oracle with the following sample code.
SQL Server
Implement the unpivoting functionality in SQL Server with the following sample code.
PostgreSQL
Implement the unpivoting functionality in PostgreSQL with the following sample code.
Returning multiple result sets from a function
It is straightforward for SQL Server to return multiple result sets with multiple rows. You can accomplish the same in PostgreSQL and Oracle with cursors as given samples below.
Oracle
Return multiple result sets from a procedure in Oracle with the following code.
SQL Server
Return multiple result sets from a procedure in SQL Server with the following code. No extra parameters are required in SQL Server.
To execute the procedure in SQL Server, enter the following code.
PostgreSQL
Return multiple result sets from a procedure in PostgreSQL with the following code.
To execute the procedure in PostgreSQL, enter the following code.
Inline queries with alias
PostgreSQL semantics may refer to inline views as Subselect
or Subquery
. Oracle supports omitting aliases for the inner statement. In PostgreSQL and SQL Server, the use of aliases is mandatory. The following code examples use B as an alias.
Oracle
The following code is a sample inline query in Oracle.
SQL Server and PostgreSQL
The same sample inline queries written in Oracle requires an alias name in SQL Server and PostgreSQL.
Data order
After migrating data from either Oracle or SQL Server to PostgreSQL, the retrieval order of the data may vary. The reason could be either the order of insertion or the data type of the column and its values or collation.
To get the correct order of the data, identify the business need and apply the Order by
clause on the query to match the data.
dblink and foreign data wrappers
dblink is the functionality used to communicate across homogeneous and heterogeneous databases. As of this post, Amazon RDS and Aurora PostgreSQL don’t offer heterogeneous support, but they do have support to communicate across the PostgreSQL databases.
Communicating across homogeneous databases
PostgreSQL support cross database communication with dblink and foreign data wrappers (FDWs) for cross-database communication. This section discusses how to use dblink and FDW.
Using foreign data wrappers
PostgreSQL supports FDWs, which you can use to access data stored in external sources. Amazon RDS and Aurora PostgreSQL support only PostgreSQL FDW currently. To configure the PostgreSQL FDW, complete the following steps.
- Create the extension with the following code.
- Create the server and link to external databases with the following code.
- Create the user mapping to access the tables from an external database with the following code.
Create user mapping for every user who would like to communicate via FDW.
- Import all the external tables into local schema to have access to the data from external tables just like regular tables accessed. Here is the sample code to import the tables from external database and schema.
Communicating across heterogeneous databases
PostgreSQL doesn’t support cross-database communication. To have heterogeneous cross-database communication, Amazon Aurora PostgreSQL has limitations, but you can implement dblink
on the source environment (for example, Oracle or SQL Server) to the target (PostgreSQL), and can either pull or push the data.
For more information, see Cross-Database Querying in Compose PostgreSQL.
Creating a view for a foreign database table with dblink
dblink
is a PostgreSQL contrib
extension that allows you to perform short ad hoc queries in other databases. With the dblink option, the user must provide and store the password in clear text, which is visible to users. This option is not recommended unless you have no other choice.
For more information, see Foreign Data Wrapper and postgres_fdw documentation.
Option 1: Provide target database access details in the SQL statement itself
In this option, the host connection and database credentials must be provided every time multiple places must change, such as any changes in the host or connection details.
Option 2: Separate out access details and use connection object
In this option, host and connection details are defined at one place and use the connection name to have the cross-database connections.
Function call with dblink
The following code is a function from a foreign PostgreSQL database that returns an integer.
The following code is a function from a foreign PostgreSQL database that returns a table type.
Finding the maximum and minimum value of a set of numbers
You may need maximum and minimum values when migrating to PostgreSQL. PostgreSQL includes a function to find these values, as demonstrated with the following code.
Considering self-join for updates
Updates work differently in PostgreSQL compared to SQL Server if you are using the same source table (the table that is getting updated) in the from
clause of select statement. In PostgreSQL, the second reference in the from
clause is independent of first reference, unlike SQL Server, and the changes are applied to the entire table.
The following code example updates salaries for employees from Department 1.
This function works the same in SQL Server, but when you migrate, the same SQL statement updates the entire table rather than a single department. PostgreSQL works differently, it assumes that the two employee
tables are independent from each other, unlike SQL Server.
To update a single department, convert the DML to the following code.
If using Oracle, convert the DML to the following code.
Summary
This post shared some tips and best practices for developers working on migrations from commercial databases to PostgreSQL. This post highlights many decisions you must make during the migration and how they can impact your database performance. Keeping these performance aspects in mind during the conversion can help avoid performance issues later on during migration.
If you have any questions or comments about this post, please share your thoughts in the comment section.
About the Author
Viswanatha Shastry Medipalli is a Consultant with the AWS ProServe team in India. His background spans a wide depth and breadth of expertise and experience in SQL database migrations. He has architected and designed many successful database solutions addressing challenging business requirements. He has provided solutions using Oracle, SQL Server and PostgreSQL for reporting, business intelligence, applications, and development support. He also has a good knowledge of automation, and orchestration. His focus area is homogeneous and heterogeneous migrations of on-premise databases to Amazon RDS and Aurora PostgreSQL.