AWS Big Data Blog
Access Amazon Location Service from Amazon Redshift
Organizations typically store business and customer data in databases like Amazon Relational Database Service (Amazon RDS) and Amazon Redshift, and often want to enrich this data by integrating with external services. One such enrichment is to add spatial attributes such as location coordinates for an address. With the introduction of Amazon Location Service, you now have access to geospatial functionalities such as map visualization, geocoding, and reverse geocoding using data providers such as Esri and HERE.
Moreover, with the ability of Amazon Redshift and RDS databases to call AWS Lambda functions through user-defined functions (UDFs), you can now integrate these databases with geospatial functionality provided by Amazon Location Service. For more information about user-defined functions in Amazon Redshift, see Creating a scalar Lambda UDF.
In this post, we take customer addresses stored in an Amazon Redshift database and call Amazon Location Service geocoding APIs to find the coordinates of the addresses and persist them in the database. If you’re using Amazon RDS databases, see Access Amazon Location Service from Amazon Aurora.
Prerequisites
Before you get started, make sure you have the following prerequisites:
- An understanding of basic concepts such as AWS Identity and Access Management (IAM) roles and policies, Lambda functions, and Amazon Redshift.
- An Amazon Redshift cluster with a table that contains customer addresses and the following details: street number, street name, street type, municipality name, state or province code, postal or zip code, and country code. You also need a column to save the coordinates for the address.
- A SQL developer tool of your choice to connect to the Amazon Redshift cluster.
- An Amazon QuickSight account with access to Amazon Redshift.
Solution overview
Our solution consists of the following components:
- A Python Lambda function to call the
search_place_index_for_text
function. This function takes a text input and returns the coordinates as longitude and latitude values for each search result. - An IAM role to allow Lambda to call the
SearchPlaceIndexForText
operation for Amazon Location Service. - A Lambda UDF in the Amazon Redshift database to invoke the Lambda function.
- An IAM role to allow Amazon Redshift to invoke the Lambda function.
- SQL statements to update and select the coordinate data for records in the Amazon Redshift database by calling the Lambda UDF.
- A QuickSight dataset that uses the SQL statement to access the coordinate data.
- A QuickSight analysis that displays the address location in a geospatial chart.
The following diagram illustrates our solution architecture:
To implement the solution, we complete the following steps:
- Set up a place index resource for Amazon Location Service.
- Create a common address geocoding function.
- Invoke the Amazon Location Service API from Amazon Redshift.
- Create a Lambda function.
- Set up Amazon Redshift.
- Run SQL statements to invoke the Lambda function.
- Visualize the address locations in QuickSight.
Pay close attention to names and parameters used in this post; they must match and be consistent across all solution components.
The full code is available in GitHub. The code also includes an AWS CloudFormation template. Please scroll to the Appendix section for deploying the CloudFormation stack.
Set up a place index resource for Amazon Location Service
Amazon Location Service uses a place index resource for providing geocoding and reverse geocoding functionality. Let’s start by creating a new place index resource. Your account may have a default place index but we don’t use it for this post because it isn’t configured for storage.
- On the Amazon Location Service console, use the navigation pane to launch the wizard to create a new place index.
- For Name, enter
placeindex.redshift
. - For Data provider, select either of the data providers for your place index.
- For Data storage options, select Yes, results will be stored because we’re saving the geocoding results in a database field.
- Leave all other values as their default and choose Create place index.
We use the place index name in the Lambda code as a parameter to the search_place_index_for_text
API call.
Create a common address geocoding Lambda function
Next, let’s create a Lambda function to call the Amazon Location Service API. We reuse this function code for both Amazon Redshift and Amazon RDS because the underlying service call is the same.
Error handling has been skipped in this code snippet for brevity. The full code is available in GitHub.
We use country_code
to limit the search_place_index_for_text
API to a particular country using the ISO 3166-1 alpha-3 country code. Amazon Location Service throws an exception if an input is provided but doesn’t contain a valid value.
Create a new file geocode.py
with the following code snippet:
Invoke the Amazon Location Service API from Amazon Redshift
Each Amazon Redshift compute node invokes Lambda functions in parallel in order to return the results quickly. The function returns a success or fail status code along with the results. In case of a failure, an error message can be returned back.
The Amazon Redshift Lambda UDF invocation provides an arguments
list containing an ordered list of input parameters representing fields of the database records (see the following code). The function reads the input parameters by the list position and maps them to variables for processing the request.
The interface expects results to be returned in the same order and containing the same number of items as the request. See the following code:
In case of an exception, the function can return the fail status along with an error message:
Create a Lambda function
We now create a Lambda function GeocodeAddresses-Redshift
using the Python runtime.
Error handling has been skipped in this code snippet for brevity. The full code is available in GitHub.
- Create the file
geocode.py
as described in the previous section.
It’s important to set the timeout on the Lambda function to a longer duration to process multiple records in one call.
- Replace the default generated code in
lambda_function.py
with the following code:
- This function requires permission to call the
search_place_index_for_text
API to geocode addresses using the place indexplaceindex.redshift
that we created earlier. - Update the IAM role for the Lambda function to add the following inline policy
GeocodeAddresses-Redshift-policy
:
Set up Amazon Redshift
Now we create a new Lambda UDF in Amazon Redshift and configure it to use an IAM role that gives the permissions to invoke the specific Lambda function. For more information, see Creating a scalar Lambda UDF.
- Create an IAM role
Redshift-Lambda-role
and add the following inline policyRedshift-Lambda-policy
to this role to allow Amazon Redshift to invoke theGeocodeAddresses-Redshift
function that we created:
We need to associate this IAM role with the Amazon Redshift cluster.
- On the Amazon Redshift console, choose the cluster to use.
- Under Manage IAM Roles, add the
Redshift-Lambda-role
role to the cluster. - Wait for the cluster to be modified and be in
Available
status.
Now we create an Amazon Redshift external function to invoke a Lambda function.
- Use your preferred SQL editor to connect to your Amazon Redshift cluster and create a new external function
f_geocode_address
in Amazon Redshift using the following code:
This external function calls the Lambda function using the permissions provided in the Redshift-Lambda-role
role that we created.
Run SQL statements to invoke the Lambda function
We’re now ready to run SQL statements that can take address records from the customer_address
table in the Amazon Redshift database and geocode them using Amazon Location Service.
If you don’t have a customer_address
table, you can create it using the script included in the full code in GitHub.
You may choose to simply select the geocoding results as part of the SQL statement, as in the following code. It’s important to understand that Lambda functions called from SELECT SQL statements contain fields from multiple database records in each call. Therefore, the timeout on the function should be set to a longer duration to process multiple records in one call.
Geocoding results can be persisted in a separate database table in Amazon Redshift. We can then parse the saved JSON results to extract the geographic coordinates.
Now let’s extract the geographic coordinates from the results field.
Visualize the address location using QuickSight
Let’s configure a dataset in QuickSight and create an analysis for this dataset.
- Create a new Amazon Redshift data source
Redshift-Geocode
either using an auto-discovered or manually configured endpoint. - Create a new dataset for the data source using the preceding SELECT statement as a custom SQL.
Now we’re ready to configure our analysis.
- On the QuickSight console, create a new QuickSight analysis using the address location dataset.
- Choose the Point on map visual type.
- Choose the longitude and latitude fields from the fields list and drag them to the Geospatial field wells.
You should now see points on the map representing the customer address locations.
Conclusion
Congratulations! We have successfully integrated Amazon Location Service with Amazon Redshift and geocoded address data in an Amazon Redshift table without leaving our SQL environment. We enriched the address data by adding location coordinates. We also successfully visualized the address locations on a map in QuickSight. You can now explore extending the solution to other functionalities of Amazon Location Service such as reverse geocoding or even integrating with Lambda functions for any other custom functionality.
Appendix
To get started, deploy the CloudFormation stack to your AWS account.
- For stack name, enter a name for your stack for easy reference.
- For Redshift cluster, enter the cluster identifier for your Amazon Redshift cluster that contains the table with customer addresses.
- Accept I acknowledge that AWS CloudFormation might create IAM resources with custom names and choose Create stack.
About the author
Parag Srivastava is a Solutions Architect at Amazon Web Services (AWS), helping enterprise customers with successful cloud adoption and migration. During his professional career, he has been extensively involved in complex digital transformation projects. He is also passionate about building innovative solutions around geospatial aspects of addresses.