AWS Big Data Blog
Integrate custom applications with AWS Lake Formation – Part 2
In the first part of this series, we demonstrated how to implement an engine that uses the capabilities of AWS Lake Formation to integrate third-party applications. This engine was built using an AWS Lambda Python function.
In this post, we explore how to deploy a fully functional web client application, built with JavaScript/React through AWS Amplify (Gen 1), that uses the same Lambda function as the backend. The provisioned web application provides a user-friendly and intuitive way to view the Lake Formation policies that have been enforced.
For the purposes of this post, we use a local machine based on MacOS and Visual Studio Code as our integrated development environment (IDE), but you could use your preferred development environment and IDE.
Solution overview
AWS AppSync creates serverless GraphQL and pub/sub APIs that simplify application development through a single endpoint to securely query, update, or publish data.
GraphQL is a data language to enable client apps to fetch, change, and subscribe to data from servers. In a GraphQL query, the client specifies how the data is to be structured when it’s returned by the server. This makes it possible for the client to query only for the data it needs, in the format that it needs it in.
Amplify streamlines full-stack app development. With its libraries, CLI, and services, you can connect your frontend to the cloud for authentication, storage, APIs, and more. Amplify provides libraries for popular web and mobile frameworks, like JavaScript, Flutter, Swift, and React.
Prerequisites
The web application that we deploy depends on the Lambda function that was deployed in the first post of this series. Make sure the function is already deployed and working in your account.
Install and configure the AWS CLI
The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command line shell. To install and configure the AWS CLI, see Getting started with the AWS CLI.
Install and configure the Amplify CLI
To install and configure the Amplify CLI, see Set up Amplify CLI. Your development machine must have the following installed:
Create the application
We create a JavaScript application using the React framework.
- In the terminal, enter the following command:
- Enter a name for your project (we use
lfappblog
), choose React for the framework, and choose JavaScript for the variant.
You can now run the next steps, ignore any warning messages. Don’t run the npm run dev
command yet.
- Enter the following command:
You should now see the directory structure shown in the following screenshot.
- You can now test the newly created application by running the following command:
By default, the application is available on port 5173
on your local machine.
The base application is shown in the workspace browser.
You can close the browser window and then the test web server by entering the following in the terminal: q + enter
Set up and configure Amplify for the application
To set up Amplify for the application, complete the following steps:
- Run the following command in the application directory to initialize Amplify:
- Refer to the following screenshot for all the options required. Make sure to change the value of Distribution Directory Path to dist. The command creates and runs the required AWS CloudFormation template to create the backend environment in your AWS account.
- Install the node modules required by the application with the following command:
The output of this command will vary depending on the packages already installed on your development machine.
Add Amplify authentication
Amplify can implement authentication with Amazon Cognito user pools. You run this step before adding the function and the Amplify API capabilities so that the user pool created can be set as the authentication mechanism for the API, otherwise it would default to the API key and further modifications would be required.
Run the following command and accept all the defaults:
Add the Amplify API
The application backend is based on a GraphQL API with resolvers implemented as a Python Lambda function. The API feature of Amplify can create the required resources for GraphQL APIs based on AWS AppSync (default) or REST APIs based on Amazon API Gateway.
- Run the following command to add and initialize the GraphQL API:
- Make sure to set Blank Schema as the schema template (a full schema is provided as part of this post; further instructions are provided in the next sections).
- Make sure to select Authorization modes and then Amazon Cognito User Pool.
Add Amplify hosting
Amplify can host applications using either the Amplify console or Amazon CloudFront and Amazon Simple Storage Service (Amazon S3) with the option to have manual or continuous deployment. For simplicity, we use the Hosting with Amplify Console and Manual Deployment options.
Run the following command:
Copy and configure the GraphQL API schema
You’re now ready to copy and configure the GraphQL schema file and update it with the current Lambda function name.
Run the following commands:
In the schema.graphql
file, you can see that the lf-app-lambda-engine
function is set as the data source for the GraphQL queries.
Copy and configure the AWS AppSync resolver template
AWS AppSync uses templates to preprocess the request payload from the client before it’s sent to the backend and postprocess the response payload from the backend before it’s sent to the client. The application requires a modified template to correctly process custom backend error messages.
Run the following commands:
In the InvokeLfAppLambdaEngineLambdaDataSource.res.vtl
file, you can inspect the .vtl resolver definition.
Copy the application client code
As last step, copy the application client code:
You can now open App.jsx
to inspect it.
Publish the full application
From the project directory, run the following command to verify all resources are ready to be created on AWS:
Run the following command to publish the full application:
This will take several minutes to complete. Accept all defaults apart from Enter maximum statement depth [increase from default if your schema is deeply nested], which must be set to 5.
All the resources are now deployed on AWS and ready for use.
Use the application
You can start using the application from the Amplify hosted domain.
- Run the following command to retrieve the application URL:
At first access, the application shows the Amazon Cognito login page.
- Choose Create Account and create a user with user name
user1
(this is mapped in the application to the rolelf-app-access-role-1
for which we created Lake Formation permissions in the first post).
- Enter the confirmation code that you received through email and choose Sign In.
When you’re logged in, you can start interacting with the application.
Controls
The application offers several controls:
- Database – You can select a database registered with Lake Formation with the Describe permission.
- Table – You can choose a table with Select permission.
- Number of records – This indicates the number of records (between 5–40) to display on the Data Because this is a sample application, no pagination was implemented in the backend.
- Row type – Enable this option to display only rows that have at least one cell with authorized data. If all cells in a row are unauthorized and checkbox is selected, the row is not displayed.
Outputs
The application has four outputs, organized in tabs.
Unfiltered Table Metadata
This tab displays the response of the AWS Glue API GetUnfilteredTableMetadata policies for the selected table. The following is an example of the content:
Unfiltered Partitions Metadata
This tab displays the response of the AWS Glue API GetUnfileteredPartitionsMetadata policies for the selected table. The following is an example of the content:
Authorized Data
This tab displays a table that shows the columns, rows, and cells that the user is authorized to access.
A cell is marked as Unauthorized if the user has no permissions to access its contents, according to the cell filter definition. You can choose the unauthorized cell to view the relevant cell filter condition.
In this example, the user can’t access the value of column surname
in the first row because for the row, state
is canada
, but the cell can only be accessed when state=’united kingdom’
.
If the Only rows with authorized data control is unchecked, rows with all cells set to Unauthorized are also displayed.
All Data
This tab contains a table that contains all the rows and columns in the table (the unfiltered data). This is useful for comparison with authorized data to understand how cell filters are applied to the unfiltered data.
Test Lake Formation permissions
Log out of the application and go to the Amazon Cognito login form, choose Create Account, and create a new user with called user2
(this is mapped in the application to the role lf-app-access-role-2
that we created Lake Formation permissions for in the first post). Get table data and metadata for this user to see how Lake Formation permissions are enforced and so the two users can see different data (on the Authorized Data tab).
The following screenshot shows that the Lake Formation permissions we created grant access to the following data (all rows, all columns) of table users_partitioned_tbl
to user2
(mapped to lf-app-access-role-2
).
The following screenshot shows that the Lake Formation permissions we created grant access to the following data (all rows, but only city
, state
, and uid
columns) of table users_tbl
to user2 (mapped to lf-app-access-role-2
).
Considerations for the GraphQL API
You can use the AWS AppSync GraphQL API deployed in this post for other applications; the responses of the GetUnfilteredTableMetadata and GetUnfileteredPartitionsMetadata AWS Glue APIs were fully mapped in the GraphQL schema. You can use the Queries page on the AWS AppSync console to run the queries; this is based on GraphiQL.
You can use the following object to define the query variables:
The following code shows the queries available with input parameters and all fields defined in the schema as output:
Clean up
To remove the resources created in this post, run the following command:
Refer to Part 1 to clean up the resources created in the first part of this series.
Conclusion
In this post, we showed how to implement a web application that uses a GraphQL API implemented with AWS AppSync and Lambda as the backend for a web application integrated with Lake Formation. You should now have a comprehensive understanding of how to extend the capabilities of Lake Formation by building and integrating your own custom data processing applications.
Try out this solution for yourself, and share your feedback and questions in the comments.
About the Authors
Stefano Sandonà is a Senior Big Data Specialist Solution Architect at AWS. Passionate about data, distributed systems, and security, he helps customers worldwide architect high-performance, efficient, and secure data platforms.
Francesco Marelli is a Principal Solutions Architect at AWS. He specializes in the design, implementation, and optimization of large-scale data platforms. Francesco leads the AWS Solution Architect (SA) analytics team in Italy. He loves sharing his professional knowledge and is a frequent speaker at AWS events. Francesco is also passionate about music.