AWS Big Data Blog

Access Amazon Location Service from Amazon Redshift

Organizations typically store business and customer data in databases like Amazon Relational Database Service (Amazon RDS) and Amazon Redshift, and often want to enrich this data by integrating with external services. One such enrichment is to add spatial attributes such as location coordinates for an address. With the introduction of Amazon Location Service, you now have access to geospatial functionalities such as map visualization, geocoding, and reverse geocoding using data providers such as Esri and HERE.

Moreover, with the ability of Amazon Redshift and RDS databases to call AWS Lambda functions through user-defined functions (UDFs), you can now integrate these databases with geospatial functionality provided by Amazon Location Service. For more information about user-defined functions in Amazon Redshift, see Creating a scalar Lambda UDF.

In this post, we take customer addresses stored in an Amazon Redshift database and call Amazon Location Service geocoding APIs to find the coordinates of the addresses and persist them in the database. If you’re using Amazon RDS databases, see Access Amazon Location Service from Amazon Aurora.

Prerequisites

Before you get started, make sure you have the following prerequisites:

  • An understanding of basic concepts such as AWS Identity and Access Management (IAM) roles and policies, Lambda functions, and Amazon Redshift.
  • An Amazon Redshift cluster with a table that contains customer addresses and the following details: street number, street name, street type, municipality name, state or province code, postal or zip code, and country code. You also need a column to save the coordinates for the address.
  • A SQL developer tool of your choice to connect to the Amazon Redshift cluster.
  • An Amazon QuickSight account with access to Amazon Redshift.

Solution overview

Our solution consists of the following components:

  • A Python Lambda function to call the search_place_index_for_text function. This function takes a text input and returns the coordinates as longitude and latitude values for each search result.
  • An IAM role to allow Lambda to call the SearchPlaceIndexForText operation for Amazon Location Service.
  • A Lambda UDF in the Amazon Redshift database to invoke the Lambda function.
  • An IAM role to allow Amazon Redshift to invoke the Lambda function.
  • SQL statements to update and select the coordinate data for records in the Amazon Redshift database by calling the Lambda UDF.
  • A QuickSight dataset that uses the SQL statement to access the coordinate data.
  • A QuickSight analysis that displays the address location in a geospatial chart.

The following diagram illustrates our solution architecture:

To implement the solution, we complete the following steps:

  1. Set up a place index resource for Amazon Location Service.
  2. Create a common address geocoding function.
  3. Invoke the Amazon Location Service API from Amazon Redshift.
  4. Create a Lambda function.
  5. Set up Amazon Redshift.
  6. Run SQL statements to invoke the Lambda function.
  7. Visualize the address locations in QuickSight.

Pay close attention to names and parameters used in this post; they must match and be consistent across all solution components.

The full code is available in GitHub. The code also includes an AWS CloudFormation template. Please scroll to the Appendix section for deploying the CloudFormation stack.

Set up a place index resource for Amazon Location Service

Amazon Location Service uses a place index resource for providing geocoding and reverse geocoding functionality. Let’s start by creating a new place index resource. Your account may have a default place index but we don’t use it for this post because it isn’t configured for storage.

  1. On the Amazon Location Service console, use the navigation pane to launch the wizard to create a new place index.
  2. For Name, enter placeindex.redshift.
  3. For Data provider, select either of the data providers for your place index.
  4. For Data storage options, select Yes, results will be stored because we’re saving the geocoding results in a database field.
  5. Leave all other values as their default and choose Create place index.

We use the place index name in the Lambda code as a parameter to the search_place_index_for_text API call.

Create a common address geocoding Lambda function

Next, let’s create a Lambda function to call the Amazon Location Service API. We reuse this function code for both Amazon Redshift and Amazon RDS because the underlying service call is the same.

Error handling has been skipped in this code snippet for brevity. The full code is available in GitHub.

We use country_code to limit the search_place_index_for_text API to a particular country using the ISO 3166-1 alpha-3 country code. Amazon Location Service throws an exception if an input is provided but doesn’t contain a valid value.

Create a new file geocode.py with the following code snippet:

import boto3
import os

location = boto3.client("location")

def geocode_address(address_line, municipality_name, state_code, post_code, country_code):

    text = ("%s, %s %s %s" % (address_line, municipality_name, state_code, post_code))
    response = location.search_place_index_for_text(IndexName="placeindex.redshift", FilterCountries=[country_code], Text=text)

    data = response["Results"]
    if len(data) >= 1:
        response = {
            "Longitude": data[0]["Place"]["Geometry"]["Point"][0],
            "Latitude": data[0]["Place"]["Geometry"]["Point"][1],
            "Label": data[0]["Place"]["Label"],
            "MultipleMatch": False
        }
        
        if len(data) > 1:
            response["MultipleMatch"] = True
    else:
        response = {
            "Error": "No geocoding results found"
        }
        
    return response

Invoke the Amazon Location Service API from Amazon Redshift

Each Amazon Redshift compute node invokes Lambda functions in parallel in order to return the results quickly. The function returns a success or fail status code along with the results. In case of a failure, an error message can be returned back.

The Amazon Redshift Lambda UDF invocation provides an arguments list containing an ordered list of input parameters representing fields of the database records (see the following code). The function reads the input parameters by the list position and maps them to variables for processing the request.

{
   "arguments":[
      [
         "410 TERRY AVE N",
         "SEATTLE",
         "WA",
         "98109",
         "USA"
      ],
      [
         "510 W GEORGIA ST",
         "VANCOUVER",
         "BC",
         "V6B0M3",
         "CAN"
      ]
   ],
   "num_records":2
}

The interface expects results to be returned in the same order and containing the same number of items as the request. See the following code:

{
   "success":true,
   "results":[
      {
         "Longitude":-122.33664798659117,
         "Latitude":47.62231899780329,
         "Label":"410 Terry Ave N, Seattle, WA, 98109, USA",
         "MultipleMatch":false
      },
      {
         "Longitude":-123.11693997267491,
         "Latitude":49.281259993687854,
         "Label":"510 W Georgia St, Vancouver, British Columbia, V6B 0M3, CAN",
         "MultipleMatch":true
      }
   ]
}

In case of an exception, the function can return the fail status along with an error message:

{
  "success": false,
  "error_msg": "Error in processing request"
}

Create a Lambda function

We now create a Lambda function GeocodeAddresses-Redshift using the Python runtime.

Error handling has been skipped in this code snippet for brevity. The full code is available in GitHub.

  1. Create the file geocode.py as described in the previous section.

It’s important to set the timeout on the Lambda function to a longer duration to process multiple records in one call.

  1. Replace the default generated code in lambda_function.py with the following code:
import json
from geocode import *

def lambda_handler(event, context):

    try:
        response = dict()
        records = event["arguments"]
        results = []

        for record in records:
            address_line, municipality_name, state_code, post_code, country_code = record
            try:
                results.append(json.dumps(geocode_address(address_line, municipality_name, state_code, post_code, country_code)))
            except:
                results.append(None)
            
        response['success'] = True
        response['results'] = results
    except Exception as e:
        response['success'] = False
        response['error_msg'] = str(e)
    
    return json.dumps(response)
  1. This function requires permission to call the search_place_index_for_text API to geocode addresses using the place index placeindex.redshift that we created earlier.
  2. Update the IAM role for the Lambda function to add the following inline policy GeocodeAddresses-Redshift-policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "geo:SearchPlaceIndexForText"
            ],
            "Resource": "arn:aws:geo:<Region>:<AccountId>:place-index/placeindex.redshift",
            "Effect": "Allow"
        }
    ]
}

Set up Amazon Redshift

Now we create a new Lambda UDF in Amazon Redshift and configure it to use an IAM role that gives the permissions to invoke the specific Lambda function. For more information, see Creating a scalar Lambda UDF.

  1. Create an IAM role Redshift-Lambda-role and add the following inline policy Redshift-Lambda-policy to this role to allow Amazon Redshift to invoke the GeocodeAddresses-Redshift function that we created:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "lambda:InvokeFunction",
            "Resource": [
                "arn:aws:lambda:<Region>:<AccountId>:function:GeocodeAddresses-Redshift"
            ]
        }
    ]
}

We need to associate this IAM role with the Amazon Redshift cluster.

  1. On the Amazon Redshift console, choose the cluster to use.
  2. Under Manage IAM Roles, add the Redshift-Lambda-role role to the cluster.
  3. Wait for the cluster to be modified and be in Available status.

Now we create an Amazon Redshift external function to invoke a Lambda function.

  1. Use your preferred SQL editor to connect to your Amazon Redshift cluster and create a new external function f_geocode_address in Amazon Redshift using the following code:
CREATE OR REPLACE EXTERNAL FUNCTION public.f_geocode_address(VARCHAR, VARCHAR,
                                                             VARCHAR, VARCHAR,
                                                             VARCHAR)
RETURNS VARCHAR
STABLE
LAMBDA 'GeocodeAddresses-Redshift'
IAM_ROLE 'arn:aws:iam::<AccountId>:role/Redshift-Lambda-role';

This external function calls the Lambda function using the permissions provided in the Redshift-Lambda-role role that we created.

Run SQL statements to invoke the Lambda function

We’re now ready to run SQL statements that can take address records from the customer_address table in the Amazon Redshift database and geocode them using Amazon Location Service.

If you don’t have a customer_address table, you can create it using the script included in the full code in GitHub.

You may choose to simply select the geocoding results as part of the SQL statement, as in the following code. It’s important to understand that Lambda functions called from SELECT SQL statements contain fields from multiple database records in each call. Therefore, the timeout on the function should be set to a longer duration to process multiple records in one call.

SELECT customer_id,
       f_geocode_address(address_line,
                         municipality_name,
                         state_code,
                         post_code,
                         country_code)
FROM customer_address
WHERE address_status = 'NEW'
  AND country_code IS NOT NULL;

Geocoding results can be persisted in a separate database table in Amazon Redshift. We can then parse the saved JSON results to extract the geographic coordinates.

CREATE TABLE customer_address_geocode AS
SELECT customer_id,
       address_line,
       municipality_name,
       state_code,
       post_code,
       country_code,
       f_geocode_address(address_line,
                         municipality_name,
                         state_code,
                         post_code,
                         country_code) AS geocode_result
FROM customer_address
WHERE address_status = 'NEW'
  AND country_code IS NOT NULL;

Now let’s extract the geographic coordinates from the results field.

SELECT customer_id,
    cast(json_extract_path_text(geocode_result, 'Longitude') as float) as Longitude,
    cast(json_extract_path_text(geocode_result, 'Latitude') as float) as Latitude
FROM customer_address_geocode;

Visualize the address location using QuickSight

Let’s configure a dataset in QuickSight and create an analysis for this dataset.

  1. Create a new Amazon Redshift data source Redshift-Geocode either using an auto-discovered or manually configured endpoint.
  2. Create a new dataset for the data source using the preceding SELECT statement as a custom SQL.

Now we’re ready to configure our analysis.

  1. On the QuickSight console, create a new QuickSight analysis using the address location dataset.
  2. Choose the Point on map visual type.

  1. Choose the longitude and latitude fields from the fields list and drag them to the Geospatial field wells.

You should now see points on the map representing the customer address locations.

Conclusion

Congratulations! We have successfully integrated Amazon Location Service with Amazon Redshift and geocoded address data in an Amazon Redshift table without leaving our SQL environment. We enriched the address data by adding location coordinates. We also successfully visualized the address locations on a map in QuickSight. You can now explore extending the solution to other functionalities of Amazon Location Service such as reverse geocoding or even integrating with Lambda functions for any other custom functionality.


Appendix

To get started, deploy the CloudFormation stack to your AWS account.

  1. For stack name, enter a name for your stack for easy reference.
  2. For Redshift cluster, enter the cluster identifier for your Amazon Redshift cluster that contains the table with customer addresses.
  3. Accept I acknowledge that AWS CloudFormation might create IAM resources with custom names and choose Create stack.

About the author

Parag Srivastava is a Solutions Architect at Amazon Web Services (AWS), helping enterprise customers with successful cloud adoption and migration. During his professional career, he has been extensively involved in complex digital transformation projects. He is also passionate about building innovative solutions around geospatial aspects of addresses.