Integrate Amazon Redshift row-level security with Amazon Redshift native IdP authentication

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers.

As enterprise customers look to build their data warehouse on Amazon Redshift, they have business requirements to enforce adequate fine-grained access controls to govern who can access which rows of their sensitive data based on the user profiler. Furthermore, many organizations want to access Amazon Redshift using their existing identity provider (IdP) while maintaining these compliance and security requirements around their data. Without proper integrated features to enforce row-level and secure authentication, you may have to develop additional solutions such as views, or try to integrate third-party solutions around your data to enforce security.

With the introduction of row-level security in Amazon Redshift, you can restrict user access at the row level. Additionally, we have introduced a native IdP functionality to help you implement authentication and authorization with your choice of business intelligence (BI) tools in a seamless way.

Amazon Redshift row-level security (RLS) provides granular access control over your sensitive data. It does this by using RLS policies to determine which rows to return in the query result sets.

In this post, we walk you through an example on how you can implement row-level security in Amazon Redshift while using existing IdP credentials to simplify authentication and managing permissions. You can use this flexible solution to provide complete control over data access while maintaining authorization using your existing IdP.

Solution overview

For our use case, an organization requires row-level security to restrict access to sales performance data to specific states and their allocated salesperson. We have the following business rules and conditions:

Alice, the salesperson for NY, should have access to NY sales data only
Bob, the salesperson for CA, should get access to CA sales data only
Charlie, the sales manager for the North America region, should have access to sales data for all states
Jen, who belongs to HR department, shouldn’t have access to any sales data

The following diagram illustrates the solution architecture we implement to solve this problem statement using Amazon Redshift row-level security and Amazon Redshift native IdP authentication.

The solution contains the following steps:

Create RLS policies to provide fine-grained access control for row-level data on the Sales table.
Create Amazon Redshift roles for each of the different Azure AD groups and assign relevant permissions to the table.

With native IdP, roles get created automatically based on Azure groups. However, as a best practice, we’re pre-creating the Amazon Redshift roles and assigning relevant permissions.

Attach row-level security policies to the roles.
Configure a JDBC or ODBC driver in your SQL client to use Azure AD federation and use Azure AD login credentials to sign in.
Upon successful authentication, Azure AD issues an authentication token (OAuth token) back to the Amazon Redshift driver.
The driver forwards the authentication token to the Amazon Redshift cluster to initiate a new database session. Amazon Redshift verifies and validates the authentication token.
Amazon Redshift calls the Azure Graph API to obtain the user’s group membership.
Amazon Redshift maps the logged-in Azure AD user to the Amazon Redshift user and maps the Azure AD groups to Amazon Redshift roles.
The Amazon Redshift roles are pre-mapped with the RLS policies mentioned in step 3. This allows the respective users to query the fine-grained row-level access data from the client.

Prerequisites

To implement this solution, you must have the following prerequisites:

An AWS account. If you don’t have one, you can sign up for one.
An Amazon Redshift Serverless or Amazon Redshift provisioned cluster. For setup instructions, refer to Setting up Amazon Redshift Serverless for the first time or Create a sample Amazon Redshift cluster.
A Microsoft Azure account that has an active subscription. You need to be an admin role to set up the application on Azure AD.
Power BI Desktop version 2.102.683.0 64-bit or higher version downloaded and installed.
The latest Amazon Redshift JDBC SDK driver-dependent libraries downloaded and the Amazon Redshift JDBC JAR .zip folder unzipped. Note that all prior Amazon Redshift driver versions before JDBC version 2.1.0.4 don’t support native IdP.
Any SQL client. For this post, we use SQL Workbench/J.

Implement your Amazon Redshift native IdP

To set up your Amazon Redshift native IdP setup, refer to Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client. Follow the steps to set up your Azure application and collect Azure AD information for the Amazon Redshift IdP.

For this post, we have created the following four groups in Azure AD:

sales_ny
sales_ca
sales_manager
hr_group

Then we created the following four users in Azure AD:

Alice – The salesperson in NY state
Bob – The salesperson in CA state
Charlie – The manager for the North America region
Jen – A member of the HR group

Add the respective users to their appropriate group:

Alice – sales_ny
Bob – sales_ca
Charlie – sales_manager
Jen – HR

Next, we need to register the IdP in Amazon Redshift using the following command:

CREATE IDENTITY PROVIDER rls_idp TYPE
azure NAMESPACE 'aad'
PARAMETERS '{
"issuer":"https://sts.windows.net/87f4aa26-78b7-410e-bf29-57b39929ef9a/",
"audience":["https://analysis.windows.net/powerbi/connector/AmazonRedshift",
"api://991abc78-78ab-4ad8-a123-zf123ab03612p"],
"client_id":"123ab555-a321-666d-7890-11a123a44890",
"client_secret":"KiG7Q~FEDnE.VsWS1IIl7LV1R2BtA4qVv2ixB" }'
;

In the preceding statement, the type azure indicates that the provider specifically facilitates communication with Microsoft Azure AD. We use the following parameters to collect Azure AD information (for more information, refer to Collect Azure AD Information in Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client).

issuer – The issuer ID to trust when a token is received. The unique identifier for the tenant_id is appended to the issuer.
client_id – The unique public identifier of the application registered with the IdP. This can be referred to as the application ID.
client_secret – A secret identifier, or password, known only to the IdP and the registered application.
audience – The application ID that is assigned to the application in Azure. For this post, we connect with Amazon Redshift using Power BI Desktop and SQL Workbench/J. The audience value is hardcoded for Power BI desktop, for example https://analysis.windows.net/powerbi/connector/AmazonRedshift . The second audience value is for the SQL client, which you get from the application ID URI in the OAuth application. For example, api://991abc78-78ab-4ad8-a123-zf123ab03612p.

Use the following command to view the registered IdP on Amazon Redshift:

DESC IDENTITY PROVIDER rls_idp;

Use the following command to view all the IdPs registered:

select * from svv_identity_providers;

The following Sales table contains information about each salesperson, the respective state they cover, and their total sales amount:

CREATE TABLE SALES (sales_person VARCHAR(30), state CHAR(2), "total_sales" INT);
INSERT INTO SALES VALUES ('Alice', 'NY', 5000);
INSERT INTO SALES VALUES ('Bob', 'CA', 6000);
INSERT INTO SALES VALUES ('Sally', 'IL', 7000);

Now we create four roles in the Amazon Redshift cluster based on the groups that we created on the Azure AD portal and assign relevant permissions to them. This simplifies administration by assigning different permissions to different roles and assigning them to different users.

The role name in the Amazon Redshift cluster looks like <namespace>:<azure_ad_group_name>, where the namespace is the one we provided in the IdP creation command (aad) and the group name is the Azure AD group. See the following code:

CREATE ROLE "aad:sales_ny";
CREATE ROLE "aad:sales_ca";
CREATE ROLE "aad:sales_manager";
CREATE ROLE "aad:hr";

Now we grant permission to the Amazon Redshift role on the appropriate tables. For this post, we assign SELECT permission on the Sales table for all four roles:

GRANT SELECT ON TABLE SALES TO ROLE "aad:sales_ny";
GRANT SELECT ON TABLE SALES TO ROLE "aad:sales_ca";
GRANT SELECT ON TABLE SALES TO ROLE "aad:sales_manager";
GRANT SELECT ON TABLE SALES TO ROLE "aad:hr";

Use the following command to view all the roles in the cluster:

select * from svv_roles;

Create a row-level security policy

Let’s enforce an RLS policy on the Sales table to restrict access to sales performance information for a salesperson specific to a particular state. We create the following policy:

CREATE RLS POLICY policy_sales_ny
WITH (state char(2))
USING (state = 'NY');
CREATE RLS POLICY policy_sales_ca
WITH (state char(2))
USING (state = 'CA');

The sales manager is also required to view sales across the North American region. For this, we create the following policy:

CREATE RLS POLICY policy_sales_all
USING (true);

The policy_sales_all policy allows the sales manager to view all the information in the sales table.

Attach the row-level security policy to roles

Now we have to attach the row-level security policies to their respective Amazon Redshift roles so that when the user logs in using their Amazon Redshift native IdP, they can get fine-grained access to the records.

ATTACH RLS POLICY policy_sales_ny ON public.sales TO ROLE "aad:sales_ny";
ATTACH RLS POLICY policy_sales_ca ON public.sales TO ROLE "aad:sales_ca";
ATTACH RLS POLICY policy_sales_all ON public.sales TO ROLE "aad:sales_manager";

For the HR role, we haven’t created or attached any RLS policy because we don’t want any user from the HR group to get access to sales records.

Enable row-level security on the table

Now let’s enable row-level security on the respective tables. In this demo, we enable the RLS policy on the Sales table using the following command:

ALTER TABLE public.sales ROW LEVEL SECURITY ON;

Use the following command to view the RLS policies:

SELECT * FROM svv_rls_attached_policy;

Test row-level security using Power BI Desktop

In this example, we use Microsoft Power BI Desktop to connect with Amazon Redshift using a native IdP. For this solution, use Microsoft Power BI Desktop- Version: 2.102.683.0 64-bit and above.

In your Microsoft Power BI Desktop, choose Get data.

Search for the Amazon Redshift connector, choose it, and choose Connect.

For Server, enter your Amazon Redshift cluster’s endpoint. For example: test-cluster.ct4abcufthff.us-east-1.redshift.amazonaws.com.
For Database, enter your database name (for this post, we enter dev).
Choose OK.

Choose Microsoft Account.

Choose Sign in.

Enter your Microsoft Account credentials in the authorization dialog. For this example, we sign in with user Alice.

Choose Next.

Once connected, you will see the message “You are currently signed in.”

Choose Connect.

As shown in the following screenshot, Azure AD user Alice is able to authenticate using an Amazon Redshift native IdP, and the RLS policies were applied automatically, allowing Alice to access sales performance information for only NY state.

Similarly, we can try signing in as user Bob and see only CA state information.

Charlie belongs to the manager role where the view all policy has been applied, so when he signs in, he is able to view all the rows in the sales table.

Finally, when Jen signs in, she can access the table, but isn’t able to view any sales records because no RLS policy has been attached to the HR role.

If we haven’t granted SELECT on the sales table to the role aad:hr, which Jen belongs to, then she can’t access the sales table.

Test row-level security using SQL Workbench/J

Now we test row-level security with an Amazon Redshift native IdP using SQL Workbench/J.

Create a new connection in SQL Workbench/J and choose Amazon Redshift as the driver.
Choose Manage drivers and add all the files from the downloaded AWS JDBC driver pack .zip file. (Remember to unzip the file.)

Make sure to use the Amazon Redshift driver 2.1.0.4 onwards, because all previous Amazon Redshift driver versions don’t support the Amazon Redshift native IDP feature.

For URL, enter jdbc:redshift://<cluster endpoint>:<port>:<databasename>. For example: jdbc:redshift://test-cluster.ab6yejheyhgf.us-east-1.redshift.amazonaws.com:5439/dev.

On the Driver properties tab, add the following properties:
1. plugin_name – com.amazon.redshift.plugin.BrowserAzureOAuth2CredentialsProvider
2. listen_port – 7890
3. idp_response_timeout – 50
4. scope – Enter the scope value from the OAuth application. For example, api://991abc78-78ab-4ad8-a123-zf123ab03612p/jdbc_login.
5. client_id – Enter the client_id value from the OAuth application. For example, 991abc78-78ab-4ad8-a123-zf123ab03612p.
6. idp_tenant – Enter the tenant ID value from the OAuth application. For example, 87f4aa26-78b7-410e-bf29-57b39929ef9a.

Choose OK from SQL Workbench/J.

You’re redirected to the browser to sign in with your Azure AD credentials.

As shown in the following screenshot, Azure AD user Alice is able to authenticate using an Amazon Redshift native IdP and view only sales performance information for NY state.

Similarly, we can re-authenticate and sign in as user Bob, who is able to view sales information specific to CA state.

When Charlie signs in, he is able to view all the rows from every state.

Finally, when Jen signs in, she is able to access the table, but can’t view any sales records because no RLS policy has been attached to the HR role.

If we haven’t granted SELECT on the sales table to the role aad:hr, which Jen belongs to, then Jen can’t access the sales table.

Summary

In this post, we covered how you can achieve a secure end-to-end experience using Amazon Redshift native IdP authentication, which simplifies administration and row-level security to enable fine-grained row-level access in Amazon Redshift.

For more information about Amazon Redshift row-level security and native IdP federation, refer to:

About the authors

Maneesh Sharma is a Senior Database Engineer at AWS with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various Amazon Redshift Partners and customers to drive better integration.

Harshida Patel is a Specialist Sr. Solutions Architect, Analytics, with AWS.

Kiran Chinta is a Software Development Manager at Amazon Redshift. He leads a strong team in query processing, SQL language, data security, and performance. Kiran is passionate about delivering products that seamlessly integrate with customers’ business applications with the right ease of use and performance. In his spare time, he enjoys reading and playing tennis.

Debu Panda is a Senior Manager, Product Management, with AWS. He is an industry leader in analytics, application platforms, and database technologies, and has more than 25 years of experience in the IT world. Debu has published numerous articles on analytics, enterprise Java, and databases, and has presented at multiple conferences such as AWS re:Invent, Oracle Open World, and Java One. He is lead author of the EJB 3 in Action (Manning Publications 2007, 2014) and Middleware Management (Packt).

AWS Big Data Blog