AWS Big Data Blog
Enhance data security with fine-grained access controls in Amazon DataZone
Fine-grained access control is a crucial aspect of data security for modern data lakes and data warehouses. As organizations handle vast amounts of data across multiple data sources, the need to manage sensitive information has become increasingly important. Making sure the right people have access to the right data, without exposing sensitive information to unauthorized individuals, is essential for maintaining data privacy, compliance, and security.
Today, Amazon DataZone has introduced fine-grained access control, providing you granular control over your data assets in the Amazon DataZone business data catalog across data lakes and data warehouses. With the new capability, data owners can now restrict access to specific records of data at row and column levels, instead of granting access to the entire data asset. For example, if your data contains columns with sensitive information such as personally identifiable information (PII), you can restrict access to only the necessary columns, making sure sensitive information is protected while still allowing access to non-sensitive data. Similarly, you can control access at the row level, allowing users to see only the records that are relevant to their role or task.
In this post, we discuss how to implement fine-grained access control with row and column asset filters using this new feature in Amazon DataZone.
Row and column filters
Row filters enable you to restrict access to specific rows based on criteria you define. For instance, if your table contains data for two regions (America and Europe) and you want to make sure that employees in Europe only access data relevant to their region, you can create a row filter that excludes rows where the region is not Europe (for example, region != 'Europe'
). This way, employees in America won’t have access to Europe’s data.
Column filters allow you to limit access to specific columns within your data assets. For example, if your table includes sensitive information such as PII, you can create a column filter to exclude PII columns. This makes sure subscribers can only access non-sensitive data.
The row and column asset filters in Amazon DataZone enable you to control who can access what using a consistent, business user-friendly mechanism for all of your data across AWS data lakes and data warehouses. To use fine-grained access control in Amazon DataZone, you can create row and column filters on top of your data assets in the Amazon DataZone business data catalog. When a user requests a subscription to your data asset, you can approve the subscription by applying the appropriate row and column filters. Amazon DataZone enforces these filters using AWS Lake Formation and Amazon Redshift, making sure the subscriber can only access the rows and columns that they are authorized to use.
Solution overview
To demonstrate the new capability, we consider a sample customer use case where an electronics ecommerce platform is looking to implement fine-grained access controls using Amazon DataZone. The customer has multiple product categories, each operated by different divisions of the company. The platform governance team wants to make sure each division has visibility only to data belonging to their own categories. Additionally, the platform governance team needs to adhere to the finance team requirements that pricing information should be visible only to the finance team.
The sales team, acting as the data producer, has published an AWS Glue table called Product sales that contains data for both Laptops
and Servers
categories to the Amazon DataZone business data catalog using the project Product-Sales
. The analytic teams in both the laptop and server divisions need to access this data for their respective analytics projects. The data owner’s objective is to grant data access to consumers based on the division they belong to. This means giving access to only rows of data with laptop sales to the laptops sales analytics team, and rows with servers sales to the server sales analytics team. Additionally, the data owner wants to restrict both teams from accessing the pricing data. This post demonstrates the implementation steps to achieve this use case in Amazon DataZone.
The steps to configure this solution are as follows:
- The publisher creates asset filters for limiting access:
- We create two row filters: a
Laptop Only
row filter that limits access to only the rows of data with laptop sales, and aServer Only
row filter that limits access to the rows of data with server sales. - We also create a column filter called
exclude-price-columns
that excludes the price-related columns from theProduct Sales
- We create two row filters: a
- Consumers discover and request subscriptions:
- The analyst from the laptops division requests a subscription to the
Product Sales
data asset. - The analyst from the servers division also request a subscription to the
Product Sales
data asset. - Both subscription requests are sent to the publisher for approval.
- The analyst from the laptops division requests a subscription to the
- The publisher approves the subscriptions and applies the appropriate filters:
- The publisher approves the request from the analysts in the laptops division, applying the
Laptop Only
row filter and the exclude-price-columns columns filter. - The publisher approves the request from the consumer in the servers division, applying the
Server Only
row filter and the exclude-price-columns columns filter.
- The publisher approves the request from the analysts in the laptops division, applying the
- Consumers access the authorized data in Amazon Athena:
- After the subscription is approved, we query the data in Athena to make sure that the analyst from the laptops division can now access only the product sales data for the
Laptop
- Similarly, the analyst from the servers division can access only the product sales data for the
Server
- Both consumers can see all columns except the price-related columns, as per the applied column filter.
- After the subscription is approved, we query the data in Athena to make sure that the analyst from the laptops division can now access only the product sales data for the
The following diagram illustrates the solution architecture and process flow.
Prerequisites
To follow along with this post, the publisher of the product sales data asset must have published a sales dataset in Amazon DataZone.
Publisher creates asset filters for limiting access
In this section, we detail the steps the publisher takes to create asset filers.
Create row filters
This dataset contains the product categories Laptops
and Servers
. We want to restrict access to the dataset that is authorized based on the product category. We use the row filter feature in Amazon DataZone to achieve this.
Amazon DataZone allows you to create row filters that can be used when approving subscriptions to make sure that the subscriber can only access rows of data as defined in the row filters. To create a row filter, complete the following steps:
- On the Amazon DataZone console, navigate to the product-sales project (the project to which the asset belongs).
- Navigate to the Data tab for the project.
- Choose Inventory data in the navigation pane, then the asset
Product Sales
, where you want to create the row filter.
You can add row filters for assets of type AWS Glue tables or Redshift tables.
- On the asset detail page, on the Asset filters tab, choose Add asset filter.
We create two row filters, one each for the Laptops
and Servers categories.
- Complete the following steps to create a laptop only asset row filter:
- Enter a name for this filter (
Laptop Only
). - Enter a description of the filter (Allow rows with product category as
Laptop Only
). - For the filter type, select Row filter.
- For the row filter expression, enter one or more expressions:
- Choose the column
Product Category
from the column dropdown menu. - Choose the operator
=
from the operator dropdown menu. - Enter the value
Laptops
in the Value field.
- Choose the column
- If you need to add another condition to the filter expression, choose Add condition. For this post, we create a filter with one condition.
- When using multiple conditions in the row filter expression, choose And or Or to link the conditions.
- You can also define the subscriber visibility. For this post, we kept the default value (No, show values to subscriber).
- Choose Create asset filter.
- Enter a name for this filter (
- Repeat the same steps to create a row filter called
Server Only
, except this time enter the value Servers in the Value field.
Create column filters
Next, we create column filters to restrict access to columns with price-related data. Complete the following steps:
- In the same asset, add another asset filter of type column filter.
- On the Asset filters tab, choose Add asset filter.
- For Name, enter a name for the filter (for this post,
exclude-price-columns
). - For Description, enter a description of the filters (for this post,
exclude price data columns
).
- For the filter type, select Column to create the column filter. This will display all the available columns in the data asset’s schema.
- Select all columns except the price-related ones.
- Choose Create asset filter.
Consumers discover and request subscriptions
In this section, we switch to the role of an analyst from the laptop division who is working within the project Sales Analytics - Laptop
. As the data consumer, we search the catalog to find the Product Sales data
asset and request access by subscribing to it.
- Log in to your project as a consumer and search for the
Product Sales
data asset.
- On the
Product Sales
data asset details page, choose Subscribe.
- For Project, choose Sales Analytics – Laptops.
- For Reason for request, enter the reason for the subscription request.
- Choose Subscribe to submit the subscription request.
Publisher approves subscriptions with filters
After the subscription request is submitted, the publisher will receive the request, and they can approve it by following these steps:
- As the publisher, open the project
Product-Sales
. - On the Data tab, choose Incoming requests in the left navigation pane.
- Locate the request and choose View request. You can filter by Pending to see only requests that are still open.
This opens the details of the request, where you can see details like who requested the access, for what project, and the reason for the request.
- To approve the request, there are two options:
- Full access – If you choose to approve the subscription with full access option, the subscriber will get access to all the rows and columns in our data asset.
- Approve with row and column filters – To limit access to specific rows and columns of data, you can choose the option to approve with row and column filters. For this post, we use both filters that we created earlier.
- Select Choose filter, then on the dropdown menu, choose the
Laptops Only
andpii-col-filter
- Choose Approve to approve the request.
After access is granted and fulfilled, the subscription looks as shown in the following screenshot.
- Now let’s log in as a consumer from the server division.
- Repeat the same steps, but this time, while approving the subscription, the publisher of sales data approves with the Server only The other steps remain the same.
Consumers access authorized data in Athena
Now that we have successfully published an asset to the Amazon DataZone catalog and subscribed to it, we can analyze it. Let’s log in as a consumer from the laptop division.
- In the Amazon DataZone data portal, choose the consumer project
Sales Analytics - Laptops
. - On the Schema tab, we can view the subscribed assets.
- Choose the project
Sales Analytics - Laptops
and choose the Overview - In the right pane, open the Athena environment.
We can now run queries on the subscribed table.
- Choose the table under Tables and views, then choose Preview to view the SELECT statement in the query editor.
- Run a query as the consumer of
Sales Analytics - Laptops
, in which we can view data only with product categoryLaptops
.
Under Tables and views, you can expand the table product_sales
. The price-related columns are not visible in the Athena environment for querying.
- Next, you can switch to the role of analyst from the server division and analyze the dataset in similar way.
- We run the same query and see that under
product_category
, the analyst can seeServers
only.
Conclusion
Amazon DataZone offers a straightforward way to implement fine-grained access controls on top of your data assets. This feature allows you to define column-level and row-level filters to enforce data privacy before the data is available to data consumers. Amazon DataZone fine-grained access control is generally available in all AWS Regions that support Amazon DataZone.
Try out the fine-grained access control feature in your own use case, and let us know your feedback in the comments section.
About the Authors
Deepmala Agarwal works as an AWS Data Specialist Solutions Architect. She is passionate about helping customers build out scalable, distributed, and data-driven solutions on AWS. When not at work, Deepmala likes spending time with family, walking, listening to music, watching movies, and cooking!
Leonardo Gomez is a Principal Analytics Specialist Solutions Architect at AWS. He has over a decade of experience in data management, helping customers around the globe address their business and technical needs. Connect with him on LinkedIn.
Utkarsh Mittal is a Senior Technical Product Manager for Amazon DataZone at AWS. He is passionate about building innovative products that simplify customers’ end-to-end analytics journeys. Outside of the tech world, Utkarsh loves to play music, with drums being his latest endeavor.