AWS Big Data Blog

How to implement access control and auditing on Amazon Redshift using Immuta

This post is co-written with Matt Vogt from Immuta. 

Organizations are looking for products that let them spend less time managing data and more time on core business functions. Data security is one of the key functions in managing a data warehouse. With Immuta integration with Amazon Redshift, user and data security operations are managed using an intuitive user interface. This blog post describes how to set up the integration, access control, governance, and user and data policies.

Amazon Redshift is a fully managed, petabyte-scale, massively parallel data warehouse that makes it fast and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Today, tens of thousands of customers run business-critical workloads on Amazon Redshift. Amazon Redshift natively supports coarse-grained and fine-grained access control with features such as role-based access control, scoped permissionsrow-level security, column-level access control and dynamic data masking.

Immuta enables organizations to break down the silos that exist between data engineering teams, business users, and security by providing a centralized platform for creating and managing policy. Access and security policies are inherently technical, forcing data engineering teams to take responsibility for creating and managing these policies. Immuta empowers business users to effectively manage access to their own datasets and it enables business users to create tag and attribute-based policies. Through Immuta’s natural language policy builder, users can create and deploy data access policies without needing help from data engineers. This distribution of policies to the business enables organizations to rapidly access their data while ensuring that the right people use it for the right reasons.

Solution overview

In this blog, we describe how data in Redshift can be protected by defining the right level of access using Immuta. Let’s consider the following example datasets and user personas. These datasets, groups, and access policies are for illustration only and have been simplified to illustrate the implementation approach.

Datasets:

  • patients: Contains patients’ personal information such as name, address, date of birth (DOB), phone number, gender, and doctor ID
  • conditions: Contains the history of patients’ medical conditions
  • immunization: Contains patients’ immunization records
  • encounters: Contains patients’ medical visits and the associated payment and coverage costs

Groups:

  • Doctor: Groups users who are doctors
  • Nurse: Groups users who are nurses
  • Admin: Groups the administrative users

Following are the four permission policies to enforce.

  • Doctor should have access to all four datasets. However, each doctor should see only the data for their own patients. They should not be able to see all the patients
  • Nurse can access only the patients and immunization And can see all patients data.
  • Admin can access only the patients and encounters And can see all patients data.
  • Patients’ social security numbers and passport information should be masked for all users.

Pre-requisites

Complete the following steps before starting the solution implementation.

  1. Create Redshift data warehouse to load sample data and create users.
  2. Create users in a Redshift Use the following names for the implementation described in this post.
    • david, chris, jon, ema, jane
  3. Create user in Immuta as described in the documentation. You can also integrate your identify manager with Immuta to share user names. For the example in this post, you will use local users.
    • David Mill, Dr Chris, Dr Jon King, Ema Joseph, Jane D

Users

  1. Immuta SaaS deployment is used for this post. However, you can use either software as a service (SaaS) deployment or self-managed deployment.
  2. Download the sample datasets and upload them to your own Amazon Simple Storage Service (Amazon S3) This data is synthetic and doesn’t include real data.
  3. Download the SQL commands and replace the Amazon S3 file path in the COPY command with the file path of the uploaded files in your account.

Implementation

The following diagram describes the high-level steps in the following sections, which you will use to build the solution.

Solution Overview

1. Map users

  1. In the Immuta portal, navigate to People and choose Users. Select a user name to map to an Amazon Redshift user name.
  2. Choose Edit for the Amazon Redshift user name and enter the corresponding Redshift username.

Map Users

  1. Repeat the steps for the other users.

2. Set up native integration

To use Immuta, you must configure Immuta native integration, which requires privileged access to administer policies in your Redshift data warehouse. See the Immuta documentation for detailed requirements.

Use the following steps to create native integration between Amazon Redshift and Immuta.

  1. In Immuta, choose App Settings from the navigation pane.
  2. Click on Integrations.
  3. Click on Add Native Integration.
  4. Enter the Redshift data warehouse endpoint name, port number, and a database name where Immuta will create policies.
  5. Enter privileged user credentials to connect with administrative privileges. These credentials aren’t stored on the Immuta platform and are used for one-time setup.
  6. You should see a successful integration with a status of Enabled.

3. Create a connection

The next step is to create a connection to the Redshift data warehouse and select specific data sources to import.

  1. In Immuta, choose Data Sources and then New Data sources in the navigation pane and choose New Data Source.
  2. Select Redshift as the Data Platform.
    Create Data Source
  3. Enter the Redshift data warehouse endpoint as the Server and the credentials to connect. Ensure the Redshift security group has inbound rules created to open access from Immuta IP addresses.
    Create Data Source2
  4. Immuta will show the schemas available on the connected database.
  5. Choose Edit under Schema/Table section.
    Schemas
  6. Select pschema from the list of schemas displayed.
    pschema
  7. Leave the values for the remaining options as the default and choose Create. This will import the metadata of the datasets and run default data discovery. In 2 to 5 minutes, you should see the table imported with status as Healthy.
    Healthy Source

4. Tag the data fields

Immuta automatically tags the data members using a default framework. It’s a starter framework that contains all the built-in and custom defined identifiers. However, you might want to add custom tags to the data fields to fit your use case. In this section, you will create custom tags and attach them to data fields. Optionally, you can also integrate with an external data catalog such as Alation, or Colibra. For this post, you will use custom tags.

Create tags

  1. In Immuta, choose Governance from the navigation pane, and then choose Tags.
  2. Choose Add Tags to open the Tag Builder dialog box
    Tags
  3. Enter Sensitive as a custom tag and choose Save.

Tags

  1. Repeat steps 1–3 to create the following tags.
    • Doctor ID: Tag to mark the doctor ID field. It will be used for defining an attribute bases access policy (ABAC).
    • Doctor Datasets: Tag to mark data sources accessible to Doctors.
    • Admin Datasets: Tag to mark data sources accessible to Admins.
    • Nurse Datasets: Tag to mark data sources accessible to Nurses.

Add tags

Now add the Sensitive tag to the ssn and passport fields in the Pschema Patient data source.

  1. In Immuta, choose Data and then Data Sources in the navigation pane and select Pschema Patient as the data source.
  2. Choose the Data Dictionary tab
  3. Find ssn in the list and choose Add Tags.

Tags

  1. Search for Sensitive tag and choose Add.

Tags

  1. Repeat the same step for the passport
  2. You should see tags applied to the fields.

Tags

  1. Using the same procedure, add the Doctor ID tag to the drid (doctor ID) field in the Pschema Patients data source.

Attributes

Now tag the data sources as required by the access policy you’re building.

  1. Choose Data and then Data Sources and select Pschema Patients as the data source.
  2. Scroll down to Tags and choose Add Tags
  3. Add Doctor Datasets, Nurse Datasets, and Admin Datasets tags to the patients data source (because this data source should be accessible by the Doctors, Nurses, and Admins groups).
Data Source Tags
Patients Doctor Datasets, Nurse Datasets, Admin Datasets
Conditions Doctor Datasets
Immunizations Doctor Datasets, Nurse Datasets
Encounters Doctor Datasets, Admin Datasets

You can create more tags and tag fields as required by your organization’s data classification rules. The Immuta data source page is where stewards and governors will spend a lot of time.

5. Create groups and add users

You must create user groups before you define policies.

  1. In Immuta, choose People and then Groups from the navigation pane and then choose New Group.
  2. Provide doctor as the group name and select Save.
  3. Repeat step1 and step2 to create the following groups:
    • nurse
    • admin
  4. You should see three groups created.

Groups

Next, you need to add users to these groups.

  1. Choose People and then Groups in the navigation pane.
  2. Select the doctor
  3. Choose Settings and choose Add Members in the Members
  4. Search for Dr Jon King in the search bar and select the user from the results. Choose close to add the user and exit the screen.
  5. You should see Dr Jon King added to the doctor.

Groups

  1. Repeat to add additional users as shown in the following table.
Group Users
Doctor Dr Jon King, Dr Chris
Nurse Jane D
admin David Mill, Ema Joseph

6. Add attributes to users

One of the security requirements is that doctors can only see the data of their patients. They shouldn’t be able to see other doctors’ patient data. To implement this requirement, you must define attributes for users who are doctors.

  1. Choose People and then Users in the navigation pane, and then select Dr Chris.
  2. Choose Settings and scroll down to the Attributes
  3. Choose Add Attributes. Enter drid as the Attribute and d1001 as the Attribute value.
  4. This will assign the attribute value of d1001 to Dr Chris. In Step 8 Define data policies, you will define a policy to show data with the matching drid attribute value.

Group Attributes

  1. Repeat steps 1–4; selecting Dr Jon King and entering d1002 as the Attribute value

7. Create subscription policy

In this section, you will provide data sources access to groups as required by the permission policy.

  • Doctors can access all four datasets: Patients, Conditions, Immunizations, and Encounters.
  • Nurses can access only Patients and Immunizations.
  • Admins can access only Patients and Encounters.

In 4. Tag the data fields, you added tags to the datasets as shown in the following table. You will now use the tags to define subscription policies.

Data source Tags
Patients Doctor Datasets, Nurse Datasets, Admin Datasets
Conditions Doctor Datasets
Immunizations Doctor Datasets, Nurse Datasets
Encounters Doctor Datasets, Admin Datasets
  1. In Immuta, choose Policies and then Subscription Policies from the navigation pane, and then choose Add Subscription Policy.
  2. Enter Doctor Access as the policy name.
  3. For the Subscription level, select Allow users with specific groups/attributes.
  4. Under Allow users to subscribe when user, select doctor. This allows only users who are members of the doctor group to access data sources accessible by doctor group.

Subscription Policy

  1. Scroll down and select Share Responsibility. This will ensure users aren’t blocked from accessing datasets even if they don’t meet all the subscription policies, which isn’t required.

Shared Responsibility

  1. Scroll further down and under Where should this policy be applied, choose On data sources, tagged and Doctor Dataset as options. It selects the datasets tagged as Doctor Dataset. You can notice that this policy applies all 4 data sources as all four data sources are tagged as Doctor Datasets.

Subscription Policy

  1. Next, create the policy by choose Activate This will create the view and policies in Redshift and enforce the permission policy.
  2. Repeat the same steps to define Nurse Access and Admin Access
    • For the Nurse Access policy, select users who are a member of the Nurse group and data sources that are tagged as Nurse Datasets.
    • For the Admin Access policy, select users who are member of the Admin group and data sources that are tagged as Admin Datasets.
  3. In Subscription policies, you should see all three policies in Active Notice the Data Sources count for how many data sources the policy is applied to.

Subscription Policy

8. Define data policies

 So far, you have defined permission policies at the data sources level. Now, you will define row and column level access using data policies. The fine-grained permission policy that you should define to restrict rows and columns is:

  • Doctors can see only the data of their own patients. In other words, when a doctor queries the patients table, then they should see only patients that match their doctor ID (drid).
  • Sensitive fields, such as ssn or passport, should be masked for everyone.
  1. In Immuta, Choose Policies and then Data Policies in the navigation pane and then choose Add Data Policy.
  2. Enter Filter by Doctor ID as the Policy name.
  3. Under How should this policy protect the data?, choose options as Only show rows , where, user possesses an attribute in drid that matches the value in column tagged Doctor ID. These settings will enforce that a doctor can see only the data of patients that have a matching Doctor ID. All other users (members of the nurse and admin groups) can see all of the patients

Data Policy

  1. Scroll down and under Where should this policy be applied?, choose On data sources, with columns tagged, Doctor ID as options. It selects the data sources that have columns tagged as Doctor ID. Notice the number of data sources it selected. It applied the policy to one data source out of the four available. Remember that you added the Doctor ID tag to the drid field for the Patients data source. So, this policy identified the Patients data source as a match and applied the policy.
    Policy
  2. Choose Activate Policy to create the policy.
  3. Similarly, create another policy to mask sensitive data for everyone.
    • Provide Mask Sensitive Data as policy name.
    • Under How should this policy protect the data?, choose Mask, columns tagged, Sensitive, using hashtag, for, everyone.
    • Under Where should this policy be applied?, choose on data sources, with columns tagged, Sensitive.

Data Policy

  1. In the Data Policies screen, you should now see both data policies in Active

Data Policy

9. Query the data to validate policies

The required permission policies are now in place. Sign in to the Redshift Query Editor as different users to see the permission policies in effect.

For example,

  1. Sign in as Dr. Jon King using the Redshift user ID jon. You should see all four tables, and if you query the patients table, you should see only the patients of Dr. Jon King; that is, patients with the Doctor ID d10002.
  2. Sign in as Ema Joseph using the Redshift user ID ema. You should see only two tables, Patients and Encounters, which are Admin datasets.
  3. You will also notice that ssn and passport are masked for both users.

Audit

 Immuta’s comprehensive auditing capabilities provide organizations with detailed visibility and control over data access and usage within their environment. The platform generates rich audit logs that capture a wealth of information about user activities, including:

  • Who’s subscribing to each data source and the reasons behind their access
  • When users are accessing the data
  • The specific SQL queries and blob fetches they are executing
  • The individual files they are accessing

The following is an example screenshot.

Audit

Industry use cases

The following are example industry use cases where Immuta and Amazon Redshift integration adds value to customer business objectives. Consider enabling the following use cases on Amazon Redshift and using Immuta.

Patient records management

In the healthcare and life sciences (HCLS) industry, efficient access to quality data is mission critical. Disjointed tools can hinder the delivery of real-time insights that are critical for healthcare decisions. These delays negatively impact patient care, as well as the production and delivery of pharmaceuticals. Streamlining access in a secure and scalable manner is vital for timely and accurate decision-making.

Data from disparate sources can easily become siloed, lost, or neglected if not stored in an accessible manner. This makes data sharing and collaboration difficult, if not impossible, for teams who rely on this data to make important treatment or research decisions. Fragmentation issues lead to incomplete or inaccurate patient records, unreliable research results, and ultimately slow down operational efficiency.

Maintaining regulatory compliance

HCLS organizations are subject to a range of industry-specific regulations and standards, such as Good Practices (GxP) and HIPAA, that ensure data quality, security, and privacy. Maintaining data integrity and traceability is fundamental, and requires robust policies and continuous monitoring to secure data throughout its lifecycle. With diverse data sets and large amounts of sensitive personal health information (PHI), balancing regulatory compliance with innovation is a significant challenge.

Complex advanced health analytics

Limited machine learning and artificial intelligence capabilities—hindered by legitimate privacy and security concerns—restrict HCLS organizations from using more advanced health analytics. This constraint affects the development of next-generation, data-driven tactics, including patient care models and predictive analytics for drug research and development. Enhancing these capabilities in a secure and compliant manner is key to unlocking the potential of health data.

Conclusion

In this post, you learned how to apply security policies on Redshift datasets using Immuta with an example use case. That includes enforcing data-set level access, attribute-level access and data masking policies. We also covered implementation step by step. Consider adopting simplified Redshift access management using Immuta and let us know your feedback.


About the Authors

Satesh Sonti is a Sr. Analytics Specialist Solutions Architect based out of Atlanta, specialized in building enterprise data platforms, data warehousing, and analytics solutions. He has over 19 years of experience in building data assets and leading complex data platform programs for banking and insurance clients across the globe.

Matt Vogt is a seasoned technology professional with over two decades of diverse experience in the tech industry, currently serving as the Vice President of Global Solution Architecture at Immuta. His expertise lies in bridging business objectives with technical requirements, focusing on data privacy, governance, and data access within Data Science, AI, ML, and advanced analytics.

Navneet Srivastava is a Principal Specialist and Analytics Strategy Leader, and develops strategic plans for building an end-to-end analytical strategy for large biopharma, healthcare, and life sciences organizations. His expertise spans across data analytics, data governance, AI, ML, big data, and healthcare-related technologies.

Somdeb Bhattacharjee is a Senior Solutions Architect specializing on data and analytics. He is part of the global Healthcare and Life sciences industry at AWS, helping his customer modernize their data platform solutions to achieve their business outcomes.

Ashok Mahajan is a Senior Solutions Architect at Amazon Web Services. Based in NYC Metropolitan area, Ashok is a part of Global Startup team focusing on Security ISV and helps them design and develop secure, scalable, and innovative solutions and architecture using the breadth and depth of AWS services and their features to deliver measurable business outcomes. Ashok has over 17 years of experience in information security, is CISSP and Access Management and AWS Certified Solutions Architect, and have diverse experience across finance, health care and media domains.