AWS Database Blog

Managing object dependencies in PostgreSQL – Overview and helpful inspection queries (Part 1)

In PostgreSQL, object binding (or dependencies) encompasses the relationships existing among various database elements. These interdependencies hold significant importance when it comes to the management and modification of objects within the database. They ensure that adjustments made to one object don’t inadvertently disrupt other dependent objects.

For instance, when a view relies on a table, any alterations to the table structure, such as modifying utilized columns, altering data types, or even dropping the table, can directly influence the functionality of the associated view. To make informed choices and preempt potential issues, it’s imperative to grasp the intricacies of these dependencies.

In this blog series, we explore the concept of object dependencies in PostgreSQL and their significance in managing a database effectively. Understanding and managing object dependencies is crucial for ensuring data integrity and making changes to the database without causing unexpected issues.

In this first post, we provide a comprehensive introduction to object dependencies and discuss various types of dependencies with examples. We also provide the queries that help you find the dependencies based on which you can take appropriate actions.

Prerequisites

To follow along with this post, complete the following prerequisites:

  1. Create an Amazon Aurora PostgreSQL-Compatible Edition cluster or Amazon Relational Database Service (Amazon RDS) for PostgreSQL instance if you don’t already have one. For instructions, refer to Create an Amazon Aurora PostgreSQL-Compatible DB cluster or Create a PostgreSQL DB instance, respectively.
  2. Create an Amazon Elastic Compute Cloud (Amazon EC2) instance to install the PostgreSQL client to access the Aurora PostgreSQL or RDS for PostgreSQL instance. For instructions, refer to Create your EC2 resources and launch your EC2 instance. Or you can set up connectivity between your RDS database and EC2 compute instance in one click.
  3. On Amazon Linux 2023, you can use the following commands to download the psql command line tool to run all the sql statements:
    sudo dnf install postgresql15.x86_64

Types of object dependencies

In this section, we introduce the different types of object dependencies and provide example code snippets.

Columns

Column dependencies in PostgreSQL can occur when one column relies on another column within the same table. This is commonly seen in scenarios where a calculated or derived column is based on the values of other columns. By establishing column dependencies, PostgreSQL ensures that the derived column is automatically updated whenever the values of the dependent columns change. This helps maintain data consistency and accuracy within the table.

Let’s consider the following example.

Create a sales table:

CREATE TABLE sales (
  sale_id SERIAL PRIMARY KEY,
  product_name VARCHAR(100),
  unit_price DECIMAL(10, 2),
  quantity INT,
  total_amount DECIMAL(10, 2) GENERATED ALWAYS AS (unit_price * quantity) STORED
);

In this example, the sales table has a calculated column, total_amount, which depends on the values of unit_price and quantity. The total_amount column is generated as the product of unit_price and quantity. This is a typical use case where the values of one column depend on the values of other columns in the same table.

The GENERATED ALWAYS AS clause is used to define the derived column, and the STORED keyword ensures that the value is physically stored, allowing for efficient querying.

This is just one example of column dependencies in PostgreSQL, showcasing how you can create calculated or derived columns that rely on other columns within the same table.

Indexes

When an index is created on a table, it becomes dependent on that table. Any changes made to the table, such as inserting, updating, or deleting rows, can impact the index. For example, if a row is inserted into the table, the index is updated to include the new row for efficient data retrieval.

Let’s consider an example to illustrate this concept:

CREATE TABLE customers (
  customer_id SERIAL PRIMARY KEY,
  first_name VARCHAR(50),
  last_name VARCHAR(50),
  email VARCHAR(100)
);

CREATE INDEX idx_customers_email ON customers(email);

In this example, we create a table called customers with columns for customer information. We then create an index called idx_customers_email on the email column. The index is dependent on the customers table.

Now, suppose we insert a new row into the customers table:

INSERT INTO customers (first_name, last_name, email)
VALUES ('John', 'Doe', 'john.doe@example.com');

This insertion triggers an update to the idx_customers_email index, ensuring that it includes the new row for efficient email-based searches.

Sequences

A sequence is an object that generates a sequence of unique values. It is often used to generate unique identifiers for tables.

When a column in a table is defined as a sequence, it establishes a dependency between the column and the sequence object. The column depends on the sequence to generate the next unique value when inserting new records into the table.

Let’s consider the following example:

-- Create a sequence
CREATE SEQUENCE employee_id_seq;

-- Create a table with a column that uses the sequence
CREATE TABLE employees (
  employee_id INTEGER DEFAULT nextval('employee_id_seq') PRIMARY KEY,
  first_name VARCHAR(50),
  last_name VARCHAR(50)
);

In this example, we first create a sequence named employee_id_seq using the CREATE SEQUENCE statement. This sequence will be used to generate unique values for the employee_id column.

Next, we create a table called employees with three columns: employee_id, first_name, and last_name. The employee_id column is defined with a default value of nextval('employee_id_seq'), which means that it will automatically obtain the next value from the sequence when a new record is inserted. The employee_id column is also declared as the primary key of the table.

User-defined data type

A user-defined data type (UDT) represents a structure or record containing single or multiple fields or attributes. It allows you to group related data together into a single object.

UDTs are used to define columns in tables or as the return type of functions. When a table column or a function returns a UDT, it establishes a dependency between the UDT and the table or function.

Let’s consider following example:

CREATE TYPE employee_info AS (
  name VARCHAR,
  address VARCHAR
);

CREATE TABLE employees_info (
  id SERIAL PRIMARY KEY,
  info employee_info
);

In this example, the employees table has a dependency on the composite data type employee_info. Any changes made to the definition of the employee_info type will affect the employees table.

Views

Views are virtual tables that are generated from the result of a query. They depend on the underlying tables and columns used in the query that defines the view. Any changes made to the referenced tables or columns can impact the views built on them. Views provide a dynamic and user-friendly way to interact with complex data structures, making them an essential aspect of database design.

Let’s consider the following example:

CREATE VIEW orders_summary AS
SELECT order_id, customer_id, order_date
FROM orders;

The orders_summary view relies on the orders table for its operation. Any adjustments applied to the orders table columns utilized in the view, whether it involves adding or removing columns or altering the data types of columns, can have an effect on how the orders_summary view functions.

In this specific case, the orders_summary view selects the order_id, customer_id, and order_date columns from the orders table. Therefore, any changes to these columns in the orders table will be reflected in the orders_summary view.

Functions and Procedures

Functions in PostgreSQL can have dependencies on tables, views, or other functions. Understanding these dependencies is crucial when developing complex database functions, because it ensures their reliability and consistency.

Let’s consider the following example:

CREATE FUNCTION calculate_salary(employee_id INTEGER)
RETURNS NUMERIC
AS $$
DECLARE
  hourly_wage NUMERIC;
  hours_worked NUMERIC;
BEGIN
  SELECT wage, hours INTO hourly_wage, hours_worked FROM employees WHERE id = employee_id and date > (now()-’1 month’::interval);
  SELECT hours INTO hours_worked FROM employees WHERE id = employee_id;
  RETURN hourly_wage * hours_worked;
END;
$$ LANGUAGE plpgsql;

This function, calculate_salary, depends on a table named employees. It retrieves the wage and hours values from the employees table based on the employee_id parameter. The function then calculates and returns the salary by multiplying the hourly_wage and hours_worked values.

This function demonstrates a dependency on the employees table because it relies on the existence of the table and the availability of the wage and hours columns within that table. Any changes to the table structure or the column names or types that were directly referenced in the function could potentially impact the functionality of this function.

Constraints

Constraints, such as primary key or unique constraints, establish dependencies on associated columns or tables. For example, a foreign key constraint on a table creates a direct dependency on another table. This ensures that the data in the referencing table adheres to specific rules defined in the referenced table. Constraints play a critical role in enforcing data integrity within a database.

Let’s consider the following example:

CREATE TABLE customers (
  customer_id SERIAL PRIMARY KEY,
  email VARCHAR(50) UNIQUE
);

CREATE TABLE orders_tab (
  order_id SERIAL PRIMARY KEY,
  customer_id INTEGER REFERENCES customers(customer_id),
  order_date DATE
);

In this example, the orders_tab table has a dependency on the customers table through the customer_id foreign key constraint.

Triggers

Triggers are database objects that automatically run in response to specific events, such as data modifications. They can depend on tables, views, or other triggers. When a trigger is fired, it may run actions that depend on the data or structure of the database, creating a dynamic and event-driven aspect of database management.

Let’s consider the following example:

CREATE TABLE products (
  product_id SERIAL PRIMARY KEY,
  name VARCHAR(50),
  quantity INTEGER,
  total_sales NUMERIC
);


CREATE OR REPLACE FUNCTION update_total_sales() RETURNS TRIGGER AS $$
BEGIN
  NEW.total_sales = NEW.quantity * 10; --Assuming that the price per unit is 10
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER products_trigger
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION update_total_sales();

In this example, the trigger function update_total_sales() multiplies the quantity by 10 (assuming that the price per unit is 10) and updates the total_sales column. The trigger products_trigger is set to run this function before an INSERT or UPDATE operation on the products table.

Inspecting and analyzing object dependencies

In PostgreSQL, it is possible to determine what objects, such as tables, views, or functions, have dependencies on each other. This information is stored in the PostgreSQL system catalog, a set of tables and views that contain metadata about all the objects in the database. By querying these system catalog views with the appropriate SQL, you can gain insights into object dependencies. These dependencies can be important to understand when making changes to database objects, as changes can have cascading effects on dependent objects. Thus, understanding these dependencies can help you maintain the integrity and functionality of your database.

Here are some of the basic catalog views that PostgreSQL provides:

  • pg_attribute – This table stores detailed information regarding attributes for tables, views, composite types, and indices.
  • pg_namespace – This table holds information related to database schemas.
  • pg_class – It functions as a directory for various database objects.
  • pg_rewrite – This table contains rules for transforming queries and views.
  • pg_depend – It is used to track information about dependencies between different database objects.
  • pg_shdepend – This table is used to monitor dependencies on objects that are utilized across multiple databases.

With these tools, you can view the inner workings of your database and understand how different parts are linked. This knowledge helps you make smart decisions when you’re managing your PostgreSQL database, almost like being an investigator in a puzzle, figuring out how all the pieces fit together.

Queries to find the object dependencies

In this section, we walk you through the queries to find the object dependencies for each object type.

Find foreign key dependencies

To find the table dependencies with respect to referential integrity constraint references, use the following query:

SELECT la.attrelid::regclass AS referencing_table,
       la.attname AS referencing_column
FROM pg_catalog.pg_constraint AS c
   JOIN pg_catalog.pg_index AS i
      ON i.indexrelid = c.conindid
   JOIN pg_catalog.pg_attribute AS la
      ON la.attrelid = c.conrelid
         AND la.attnum = c.conkey[1]
   JOIN pg_catalog.pg_attribute AS ra
      ON ra.attrelid = c.confrelid
         AND ra.attnum = c.confkey[1]
WHERE c.confrelid = '<table_name>'::regclass
  AND c.contype = 'f'
  AND cardinality(c.confkey) = 1;

From previous examples, the orders and order_items tables are dependent. The order_id column of the order_items table is dependent on the order_id column of the orders table. Let’s run the query for the orders table and see the dependent table:

SELECT la.attrelid::regclass AS referencing_table,
        la.attname AS referencing_column
 FROM pg_catalog.pg_constraint AS c
    JOIN pg_catalog.pg_index AS i
       ON i.indexrelid = c.conindid
    JOIN pg_catalog.pg_attribute AS la
       ON la.attrelid = c.conrelid
          AND la.attnum = c.conkey[1]
    JOIN pg_catalog.pg_attribute AS ra
       ON ra.attrelid = c.confrelid
          AND ra.attnum = c.confkey[1]
 WHERE c.confrelid = 'orders'::regclass
   AND c.contype = 'f'
   AND cardinality(c.confkey) = 1;

Following is the output of the query:

 referencing_table | referencing_column
-------------------+--------------------
 order_items       | order_id
(1 row)

The query shows the order_items table and order_id column as dependent in the table created.

Find table type dependencies

The following is a query to find the dependent table type for a table:

SELECT
    attrelid::regclass AS source_table,
    atttypid::regtype AS dependent_table_type
FROM
    pg_catalog.pg_attribute
WHERE
    attrelid = '<source_table>'::regclass
    AND atttypid::regtype::text IN (
        SELECT
            relname
        FROM
            pg_catalog.pg_class);

From our other example, the car table has a dependency on Person table because one of the column’s data types in the car table is the person table.

SELECT
     attrelid::regclass AS source_table,
     atttypid::regtype AS dependent_table_type
 FROM
     pg_catalog.pg_attribute
 WHERE
     attrelid = 'car'::regclass
     AND atttypid::regtype::text IN (
         SELECT
             relname
         FROM
             pg_catalog.pg_class);

Following is the output of the query:

 source_table | dependent_table_type
--------------+----------------------
 car          | person
(1 row)

Find column dependencies on another column

The following is the query to find the dependent columns for a column in the same table:

SELECT DISTINCT n.nspname AS table_schema,
    c.relname AS table_name,
    ac.attname AS column_name,
    ad.attname AS dependent_column
   FROM pg_catalog.pg_namespace n,
    pg_catalog.pg_class c,
    pg_catalog.pg_depend d,
    pg_catalog.pg_attribute ac,
    pg_catalog.pg_attribute ad,
    pg_catalog.pg_attrdef atd
  WHERE n.oid = c.relnamespace AND c.oid = ac.attrelid AND c.oid = ad.attrelid AND ac.attnum <> ad.attnum AND ad.attrelid = atd.adrelid AND ad.attnum = atd.adnum AND d.classid = 'pg_attrdef'::regclass::oid AND d.refclassid = 'pg_class'::regclass::oid AND d.objid = atd.oid AND d.refobjid = ac.attrelid AND d.refobjsubid = ac.attnum AND ad.attgenerated <> ''::"char" AND pg_has_role(c.relowner, 'USAGE'::text)
  AND c.relname='<table_name>';

From the example, the sales table has a total_amount column, which is a derived column from the quantity and unit_price columns. The following query shows those dependent columns:

SELECT DISTINCT
    n.nspname AS table_schema,
    c.relname AS table_name,
    ac.attname AS column_name,
    ad.attname AS dependent_column
FROM
    pg_catalog.pg_namespace n,
    pg_catalog.pg_class c,
    pg_catalog.pg_depend d,
    pg_catalog.pg_attribute ac,
    pg_catalog.pg_attribute ad,
    pg_catalog.pg_attrdef atd
WHERE
    n.oid = c.relnamespace
    AND c.oid = ac.attrelid
    AND c.oid = ad.attrelid
    AND ac.attnum <> ad.attnum
    AND ad.attrelid = atd.adrelid
    AND ad.attnum = atd.adnum
    AND d.classid = 'pg_attrdef'::regclass::oid
    AND d.refclassid = 'pg_class'::regclass::oid
    AND d.objid = atd.oid
    AND d.refobjid = ac.attrelid
    AND d.refobjsubid = ac.attnum
    AND ad.attgenerated <> ''::"char"
    AND pg_has_role(c.relowner, 'USAGE'::text)
    AND c.relname = 'sales';

Following is the output of the query:

 table_schema | table_name | column_name | dependent_column
--------------+------------+-------------+------------------
 public       | sales      | quantity    | total_amount
 public       | sales      | unit_price  | total_amount
(2 rows)

Find indexes of a table

The following is a query to find the indexes for a table:

SELECT * FROM pg_catalog.pg_indexes WHERE tablename ='<table_name>';

From the example, the customers table has a unique key and an index. The following is a query to find the indexes for the customers table:

SELECT * FROM pg_catalog.pg_indexes WHERE tablename ='customers';

Following is the output of the query:

 schemaname | tablename |      indexname      | tablespace |                                 indexdef
------------+-----------+---------------------+------------+--------------------------------------------------------------------------
 public     | customers | customers_pkey      |            | CREATE UNIQUE INDEX customers_pkey ON public.customers USING btree (id)
 public     | customers | idx_customers_email |            | CREATE INDEX idx_customers_email ON public.customers USING btree (email)
(2 rows)

Find sequence dependencies on the tables

The following is the query to find the sequence dependencies:

SELECT
    table_name,
    column_name,
    column_default
FROM (
    SELECT
        attrelid::regclass AS table_name,
        attname AS column_name,
        CASE WHEN a.attgenerated = ''::"char" THEN
            pg_get_expr(ad.adbin, ad.adrelid)
        ELSE
            NULL::text
        END AS column_default
    FROM
        pg_catalog.pg_attribute a
    LEFT JOIN pg_catalog.pg_attrdef ad ON a.attrelid = ad.adrelid
        AND a.attnum = ad.adnum
WHERE
    a.attrelid = ''::regclass) foo
WHERE
    column_default LIKE 'nextval%';

From the example, the employees table has an employee_id column with a default value to the next value from a sequence. The following is the query to find the sequence that refers to the employees table:

SELECT
    table_name,
    column_name,
    column_default
FROM (
    SELECT
        attrelid::regclass AS table_name,
        attname AS column_name,
        CASE WHEN a.attgenerated = ''::"char" THEN
            pg_get_expr(ad.adbin, ad.adrelid)
        ELSE
            NULL::text
        END AS column_default
    FROM
        pg_catalog.pg_attribute a
    LEFT JOIN pg_catalog.pg_attrdef ad ON a.attrelid = ad.adrelid
        AND a.attnum = ad.adnum
WHERE
    a.attrelid = 'employees'::regclass) foo
WHERE
    column_default LIKE 'nextval%';

Following is the output of the query:

table_name | column_name |            column_default
------------+-------------+--------------------------------------
 employees  | employee_id | nextval('employee_id_seq'::regclass)

Find UDT dependencies on tables and columns

The following is the query to find UDTs defined to a table’s column:

SELECT
    COALESCE(nbt.nspname, nt.nspname) AS udt_schema,
    COALESCE(bt.typname, t.typname) AS udt_name,
    current_database() AS table_catalog,
    nc.nspname AS table_schema,
    c.relname AS table_name,
    a.attname AS column_name
FROM
    pg_catalog.pg_attribute a,
    pg_catalog.pg_class c,
    pg_catalog.pg_namespace nc,
    pg_catalog.pg_type t
    JOIN pg_catalog.pg_namespace nt ON t.typnamespace = nt.oid
    LEFT JOIN (pg_type bt
        JOIN pg_catalog.pg_namespace nbt ON bt.typnamespace = nbt.oid) ON t.typtype = 'd'::"char"
        AND t.typbasetype = bt.oid
WHERE
    a.attrelid = c.oid
    AND a.atttypid = t.oid
    AND nc.oid = c.relnamespace
    AND a.attnum > 0
    AND NOT a.attisdropped
    AND (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char", 'p'::"char"]))
    AND COALESCE(bt.typname, t.typname) = '’;

From the example, the employees_info table’s info column is defined as the employee_info UDT. The following query finds the dependency:

SELECT
    COALESCE(nbt.nspname, nt.nspname) AS udt_schema,
    COALESCE(bt.typname, t.typname) AS udt_name,
    current_database() AS table_catalog,
    nc.nspname AS table_schema,
    c.relname AS table_name,
    a.attname AS column_name
FROM
    pg_catalog.pg_attribute a,
    pg_catalog.pg_class c,
    pg_catalog.pg_namespace nc,
    pg_catalog.pg_type t
    JOIN pg_catalog.pg_namespace nt ON t.typnamespace = nt.oid
    LEFT JOIN (pg_type bt
        JOIN pg_catalog.pg_namespace nbt ON bt.typnamespace = nbt.oid) ON t.typtype = 'd'::"char"
        AND t.typbasetype = bt.oid
WHERE
    a.attrelid = c.oid
    AND a.atttypid = t.oid
    AND nc.oid = c.relnamespace
    AND a.attnum > 0
    AND NOT a.attisdropped
    AND (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char", 'p'::"char"]))
    AND COALESCE(bt.typname, t.typname) = 'employee_info';

Following is the output of the query:

udt_schema |   udt_name    | table_catalog | table_schema |   table_name   | column_name
-------------+------------+---------------+---------------+--------------+----------------+-------------
public     | employee_info | object_blog   | public       | employees_info | info
(1 row)

Find view dependencies

The following is the query to find the dependent views of a table:

SELECT DISTINCT v.oid::regclass AS view
FROM pg_catalog.pg_depend AS d     
   JOIN pg_catalog.pg_rewrite AS r 
      ON r.oid = d.objid
   JOIN pg_catalog.pg_class AS v    
      ON v.oid = r.ev_class
WHERE v.relkind = 'v'    
  AND d.classid = 'pg_rewrite'::regclass
  AND d.refclassid = 'pg_class'::regclass
  AND d.deptype = 'n'    
  AND d.refobjid = '<table_name>'::regclass;

From the example, the orders_summary view is based on the orders table. The following query finds the dependent views of the orders table:

SELECT DISTINCT v.relnamespace::regnamespace, v.oid::regclass AS view
 FROM pg_catalog.pg_depend AS d
    JOIN pg_catalog.pg_rewrite AS r
       ON r.oid = d.objid
    JOIN pg_catalog.pg_class AS v
       ON v.oid = r.ev_class
 WHERE v.relkind = 'v'
   AND d.classid = 'pg_rewrite'::regclass
   AND d.refclassid = 'pg_class'::regclass
   AND d.deptype = 'n'
   AND d.refobjid = 'orders'::regclass;

Following is the output of the query:

 relnamespace |      view
--------------+----------------
 public       | orders_summary
(1 row)

Find triggers

The following is the query to find the trigger dependencies of a table:

SELECT
    tgrelid::regclass AS table_name,
    tgname AS trigger_name
FROM
    pg_catalog.pg_trigger
WHERE
    tgrelid = ''::regclass;

From the example, the products table has products_trigger. The following query finds the trigger dependencies of the products table:

SELECT
    tgrelid::regclass AS table_name,
    tgname AS trigger_name
FROM
    pg_catalog.pg_trigger
WHERE
    tgrelid = 'products'::regclass;

Following is the output of the query:

table_name |   trigger_name
------------+------------------
 products   | products_trigger 
(1 row)

Conclusion

In this first part of our series on object dependencies in PostgreSQL, we introduced the concept and discussed different types of dependencies. We also explored the importance of understanding object dependencies in managing a database effectively. Additionally, we highlighted some of the views and functions available in PostgreSQL for querying object dependencies.

Stay tuned for the next part of our series, where we will dive deep into the tools and techniques for managing object dependencies in PostgreSQL. We will explore how to use DROP and ALTER statements to handle dependencies and discuss best practices for avoiding potential problems.

By gaining a deeper understanding of object dependencies, you can optimize your PostgreSQL database operations, ensure data integrity, and make informed decisions when making changes to your database.


About the authors

Baji Shaik is a Sr. Database Consultant with AWS Professional Services, Global Competency Centre. He is a Database Migration Expert and has developed many successful database solutions addressing challenging business requirements for moving databases from on premises Oracle, SQL Server to Amazon RDS and Amazon Aurora PostgreSQL/MySQL. He is an eminent author, having written several books on PostgreSQL. A few of his recent works include “ Procedural Programming with PostgreSQL PL/pgSQL” “ PostgreSQL Configuration,”Beginning PostgreSQL on the Cloud,” and “ PostgreSQL Development Essentials.” Furthermore, he has delivered several conference and workshop sessions.

Wajid Ali Mir is a Database Consultant at AWS and works as a database migration specialist, helping and enabling customers to build highly available, cost-effective database solutions and migrate their database workloads to the AWS Cloud. Wajid works with AWS customers and partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.

Viswanatha Shastry Medipalli is a Consultant with the AWS ProServe team. His background spans a wide depth and breadth of expertise and experience in database migrations. He has architected and designed many successful database solutions addressing challenging business requirements. He has provided solutions using Oracle, SQL Server and PostgreSQL for reporting, BI, applications, and development support. He also has a good knowledge of automation, and orchestration. His focus area is homogeneous and heterogeneous migrations of on-premise databases to Amazon.

Yaser Raja is a Principal Strategic Tech Advisor with Professional Services team at Amazon Web Services. He works with customers to build scalable, highly available and secure solutions in AWS cloud. His focus area includes database migrations and modernization, and leveraging Generative AI capabilities to build innovative solutions for the customers.”