AWS Big Data Blog
Gain insights from historical location data using Amazon Location Service and AWS analytics services
Many organizations around the world rely on the use of physical assets, such as vehicles, to deliver a service to their end-customers. By tracking these assets in real time and storing the results, asset owners can derive valuable insights on how their assets are being used to continuously deliver business improvements and plan for future changes. For example, a delivery company operating a fleet of vehicles may need to ascertain the impact from local policy changes outside of their control, such as the announced expansion of an Ultra-Low Emission Zone (ULEZ). By combining historical vehicle location data with information from other sources, the company can devise empirical approaches for better decision-making. For example, the company’s procurement team can use this information to make decisions about which vehicles to prioritize for replacement before policy changes go into effect.
Developers can use the support in Amazon Location Service for publishing device position updates to Amazon EventBridge to build a near-real-time data pipeline that stores locations of tracked assets in Amazon Simple Storage Service (Amazon S3). Additionally, you can use AWS Lambda to enrich incoming location data with data from other sources, such as an Amazon DynamoDB table containing vehicle maintenance details. Then a data analyst can use the geospatial querying capabilities of Amazon Athena to gain insights, such as the number of days their vehicles have operated in the proposed boundaries of an expanded ULEZ. Because vehicles that do not meet ULEZ emissions standards are subjected to a daily charge to operate within the zone, you can use the location data, along with maintenance data such as age of the vehicle, current mileage, and current emissions standards to estimate the amount the company would have to spend on daily fees.
This post shows how you can use Amazon Location, EventBridge, Lambda, Amazon Data Firehose, and Amazon S3 to build a location-aware data pipeline, and use this data to drive meaningful insights using AWS Glue and Athena.
Overview of solution
This is a fully serverless solution for location-based asset management. The solution consists of the following interfaces:
- IoT or mobile application – A mobile application or an Internet of Things (IoT) device allows the tracking of a company vehicle while it is in use and transmits its current location securely to the data ingestion layer in AWS. The ingestion approach is not in scope of this post. Instead, a Lambda function in our solution simulates sample vehicle journeys and directly updates Amazon Location tracker objects with randomized locations.
- Data analytics – Business analysts gather operational insights from multiple data sources, including the location data collected from the vehicles. Data analysts are looking for answers to questions such as, “How long did a given vehicle historically spend inside a proposed zone, and how much would the fees have cost had the policy been in place over the past 12 months?”
The following diagram illustrates the solution architecture.
The workflow consists of the following key steps:
- The tracking functionality of Amazon Location is used to track the vehicle. Using EventBridge integration, filtered positional updates are published to an EventBridge event bus. This solution uses distance-based filtering to reduce costs and jitter. Distanced-based filtering ignores location updates in which devices have moved less than 30 meters (98.4 feet).
- Amazon Location device position events arrive on the EventBridge
default
bus withsource: ["aws.geo"]
anddetail-type: ["Location Device Position Event"]
. One rule is created to forward these events to two downstream targets: a Lambda function, and a Firehose delivery stream. - Two different patterns, based on each target, are described in this post to demonstrate different approaches to committing the data to a S3 bucket:
- Lambda function – The first approach uses a Lambda function to demonstrate how you can use code in the data pipeline to directly transform the incoming location data. You can modify the Lambda function to fetch additional vehicle information from a separate data store (for example, a DynamoDB table or a Customer Relationship Management system) to enrich the data, before storing the results in an S3 bucket. In this model, the Lambda function is invoked for each incoming event.
- Firehose delivery stream – The second approach uses a Firehose delivery stream to buffer and batch the incoming positional updates, before storing them in an S3 bucket without modification. This method uses GZIP compression to optimize storage consumption and query performance. You can also use the data transformation feature of Data Firehose to invoke a Lambda function to perform data transformation in batches.
- AWS Glue crawls both S3 bucket paths, populates the AWS Glue database tables based on the inferred schemas, and makes the data available to other analytics applications through the AWS Glue Data Catalog.
- Athena is used to run geospatial queries on the location data stored in the S3 buckets. The Data Catalog provides metadata that allows analytics applications using Athena to find, read, and process the location data stored in Amazon S3.
- This solution includes a Lambda function that continuously updates the Amazon Location tracker with simulated location data from fictitious journeys. The Lambda function is triggered at regular intervals using a scheduled EventBridge rule.
You can test this solution yourself using the AWS Samples GitHub repository. The repository contains the AWS Serverless Application Model (AWS SAM) template and Lambda code required to try out this solution. Refer to the instructions in the README file for steps on how to provision and decommission this solution.
Visual layouts in some screenshots in this post may look different than those on your AWS Management Console.
Data generation
In this section, we discuss the steps to manually or automatically generate journey data.
Manually generate journey data
You can manually update device positions using the AWS Command Line Interface (AWS CLI) command aws location batch-update-device-position
. Replace the tracker-name
, device-id
, Position
, and SampleTime
values with your own, and make sure that successive updates are more than 30 meters in distance apart to place an event on the default
EventBridge event bus:
Automatically generate journey data using the simulator
The provided AWS CloudFormation template deploys an EventBridge scheduled rule and an accompanying Lambda function that simulates tracker updates from vehicles. This rule is enabled by default, and runs at a frequency specified by the SimulationIntervalMinutes
CloudFormation parameter. The data generation Lambda function updates the Amazon Location tracker with a randomized position offset from the vehicles’ base locations.
Vehicle names and base locations are stored in the vehicles.json file. A vehicle’s starting position is reset each day, and base locations have been chosen to give them the ability to drift in and out of the ULEZ on a given day to provide a realistic journey simulation.
You can disable the rule temporarily by navigating to the scheduled rule details on the EventBridge console. Alternatively, change the parameter State: ENABLED
to State: DISABLED
for the scheduled rule resource GenerateDevicePositionsScheduleRule
in the template.yml file. Rebuild and re-deploy the AWS SAM template for this change to take effect.
Location data pipeline approaches
The configurations outlined in this section are deployed automatically by the provided AWS SAM template. The information in this section is provided to describe the pertinent parts of the solution.
Amazon Location device position events
Amazon Location sends device position update events to EventBridge in the following format:
You can optionally specify an input transformation to modify the format and contents of the device position event data before it reaches the target.
Data enrichment using Lambda
Data enrichment in this pattern is facilitated through the invocation of a Lambda function. In this example, we call this function ProcessDevicePosition
, and use a Python runtime. A custom transformation is applied in the EventBridge target definition to receive the event data in the following format:
You could apply additional transformations, such as the refactoring of Latitude
and Longitude
data into separate key-value pairs if this is required by the downstream business logic processing the events.
The following code demonstrates the Python application logic that is run by the ProcessDevicePosition
Lambda function. Error handling has been skipped in this code snippet for brevity. The full code is available in the GitHub repo.
The preceding code creates an S3 object for each device position event received by EventBridge. The code uses the DeviceId
as a prefix to write the objects to the bucket.
You can add additional logic to the preceding Lambda function code to enrich the event data using other sources. The example in the GitHub repo demonstrates enriching the event with data from a DynamoDB vehicle maintenance table.
In addition to the prerequisite AWS Identity and Access Management (IAM) permissions provided by the role AWSBasicLambdaExecutionRole
, the ProcessDevicePosition
function requires permissions to perform the S3 put_object
action and any other actions required by the data enrichment logic. IAM permissions required by the solution are documented in the template.yml file.
Data pipeline using Amazon Data Firehose
Complete the following steps to create your Firehose delivery stream:
- On the Amazon Data Firehose console, choose Firehose streams in the navigation pane.
- Choose Create Firehose stream.
- For Source, choose as Direct PUT.
- For Destination, choose Amazon S3.
- For Firehose stream name, enter a name (for this post,
ProcessDevicePositionFirehose
).
- Configure the destination settings with details about the S3 bucket in which the location data is stored, along with the partitioning strategy:
- Use <S3_BUCKET_NAME> and <S3_BUCKET_FIREHOSE_PREFIX> to determine the bucket and object prefixes.
- Use
DeviceId
as an additional prefix to write the objects to the bucket.
- Enable Dynamic partitioning and New line delimiter to make sure partitioning is automatic based on
DeviceId
, and that new line delimiters are added between records in objects that are delivered to Amazon S3.
These are required by AWS Glue to later crawl the data, and for Athena to recognize individual records.
Create an EventBridge rule and attach targets
The EventBridge rule ProcessDevicePosition
defines two targets: the ProcessDevicePosition
Lambda function, and the ProcessDevicePositionFirehose
delivery stream. Complete the following steps to create the rule and attach targets:
- On the EventBridge console, create a new rule.
- For Name, enter a name (for this post,
ProcessDevicePosition
). - For Event bus¸ choose default.
- For Rule type¸ select Rule with an event pattern.
- For Event source, select AWS events or EventBridge partner events.
- For Method, select Use pattern form.
- In the Event pattern section, specify AWS services as the source, Amazon Location Service as the specific service, and Location Device Position Event as the event type.
- For Target 1, attach the
ProcessDevicePosition
Lambda function as a target.
- We use Input transformer to customize the event that is committed to the S3 bucket.
- Configure Input paths map and Input template to organize the payload into the desired format.
- The following code is the input paths map:
- The following code is the input template:
- For Target 2, choose the
ProcessDevicePositionFirehose
delivery stream as a target.
This target requires an IAM role that allows one or multiple records to be written to the Firehose delivery stream:
Crawl and catalog the data using AWS Glue
After sufficient data has been generated, complete the following steps:
- On the AWS Glue console, choose Crawlers in the navigation pane.
- Select the crawlers that have been created,
location-analytics-glue-crawler-lambda
andlocation-analytics-glue-crawler-firehose
. - Choose Run.
The crawlers will automatically classify the data into JSON format, group the records into tables and partitions, and commit associated metadata to the AWS Glue Data Catalog.
- When the Last run statuses of both crawlers show as Succeeded, confirm that two tables (
lambda
andfirehose
) have been created on the Tables page.
The solution partitions the incoming location data based on the deviceid
field. Therefore, as long as there are no new devices or schema changes, the crawlers don’t need to run again. However, if new devices are added, or a different field is used for partitioning, the crawlers need to run again.
You’re now ready to query the tables using Athena.
Query the data using Athena
Athena is a serverless, interactive analytics service built to analyze unstructured, semi-structured, and structured data where it is hosted. If this is your first time using the Athena console, follow the instructions to set up a query result location in Amazon S3. To query the data with Athena, complete the following steps:
- On the Athena console, open the query editor.
- For Data source, choose
AwsDataCatalog
. - For Database, choose
location-analytics-glue-database
. - On the options menu (three vertical dots), choose Preview Table to query the content of both tables.
The query displays 10 sample positional records currently stored in the table. The following screenshot is an example from previewing the firehose
table. The firehose
table stores raw, unmodified data from the Amazon Location tracker.
You can now experiment with geospatial queries.The GeoJSON file for the 2021 London ULEZ expansion is part of the repository, and has already been converted into a query compatible with both Athena tables.
- Copy and paste the content from the 1-firehose-athena-ulez-2021-create-view.sql file found in the
examples/firehose
folder into the query editor.
This query uses the ST_Within
geospatial function to determine if a recorded position is inside or outside the ULEZ zone defined by the polygon. A new view called ulezvehicleanalysis_firehose
is created with a new column, insidezone
, which captures whether the recorded position exists within the zone.
A simple Python utility is provided, which converts the polygon features found in the downloaded GeoJSON file into ST_Polygon
strings based on the well-known text format that can be used directly in an Athena query.
- Choose Preview View on the
ulezvehicleanalysis_firehose
view to explore its content.
You can now run queries against this view to gain overarching insights.
- Copy and paste the content from the 2-firehose-athena-ulez-2021-query-days-in-zone.sql file found in the
examples/firehose
folder into the query editor.
This query establishes the total number of days each vehicle has entered ULEZ, and what the expected total charges would be. The query has been parameterized using the ?
placeholder character. Parameterized queries allow you to rerun the same query with different parameter values.
- Enter the daily fee amount for Parameter 1, then run the query.
The results display each vehicle, the total number of days spent in the proposed ULEZ, and the total charges based on the daily fee you entered.
You can repeat this exercise using the lambda
table. Data in the lambda
table is augmented with additional vehicle details present in the vehicle maintenance DynamoDB table at the time it is processed by the Lambda function. The solution supports the following fields:
MeetsEmissionStandards
(Boolean)Mileage
(Number)PurchaseDate
(String, inYYYY-MM-DD
format)
You can also enrich the new data as it arrives.
- On the DynamoDB console, find the vehicle maintenance table under Tables. The table name is provided as output
VehicleMaintenanceDynamoTable
in the deployed CloudFormation stack. - Choose Explore table items to view the content of the table.
- Choose Create item to create a new record for a vehicle.
- Enter
DeviceId
(such asvehicle1
as a String),PurchaseDate
(such as2005-10-01
as a String),Mileage
(such as10000
as a Number), andMeetsEmissionStandards
(with a value such asFalse
as Boolean). - Choose Create item to create the record.
- Duplicate the newly created record with additional entries for other vehicles (such as for
vehicle2
orvehicle3
), modifying the values of the attributes slightly each time. - Rerun the
location-analytics-glue-crawler-lambda
AWS Glue crawler after new data has been generated to confirm that the update to the schema with new fields is registered. - Copy and paste the content from the 1-lambda-athena-ulez-2021-create-view.sql file found in the
examples/lambda
folder into the query editor. - Preview the
ulezvehicleanalysis_lambda
view to confirm that the new columns have been created.
If errors such as Column 'mileage' cannot be resolved
are displayed, the data enrichment is not taking place, or the AWS Glue crawler has not yet detected updates to the schema.
If the Preview table option is only returning results from before you created records in the DynamoDB table, return the query results in descending order using sampletime
(for example, order by sampletime desc limit 100;
).
Now we focus on the vehicles that don’t currently meet emissions standards, and order the vehicles in descending order based on the mileage per year (calculated using the latest mileage / age of vehicle in years).
- Copy and paste the content from the 2-lambda-athena-ulez-2021-query-days-in-zone.sql file found in the
examples/lambda
folder into the query editor.
In this example, we can see that out of our fleet of vehicles, five have been reported as not meeting emission standards. We can also see the vehicles that have accumulated high mileage per year, and the number of days spent in the proposed ULEZ. The fleet operator may now decide to prioritize these vehicles for replacement. Because location data is enriched with the most up-to-date vehicle maintenance data at the time it is ingested, you can further evolve these queries to run over a defined time window. For example, you could factor in mileage changes within the past year.
Due to the dynamic nature of the data enrichment, any new data being committed to Amazon S3, along with the query results, will be altered as and when records are updated in the DynamoDB vehicle maintenance table.
Clean up
Refer to the instructions in the README file to clean up the resources provisioned for this solution.
Conclusion
This post demonstrated how you can use Amazon Location, EventBridge, Lambda, Amazon Data Firehose, and Amazon S3 to build a location-aware data pipeline, and use the collected device position data to drive analytical insights using AWS Glue and Athena. By tracking these assets in real time and storing the results, companies can derive valuable insights on how effectively their fleets are being utilized and better react to changes in the future. You can now explore extending this sample code with your own device tracking data and analytics requirements.
About the Authors
Alan Peaty is a Senior Partner Solutions Architect at AWS. Alan helps Global Systems Integrators (GSIs) and Global Independent Software Vendors (GISVs) solve complex customer challenges using AWS services. Prior to joining AWS, Alan worked as an architect at systems integrators to translate business requirements into technical solutions. Outside of work, Alan is an IoT enthusiast and a keen runner who loves to hit the muddy trails of the English countryside.
Parag Srivastava is a Solutions Architect at AWS, helping enterprise customers with successful cloud adoption and migration. During his professional career, he has been extensively involved in complex digital transformation projects. He is also passionate about building innovative solutions around geospatial aspects of addresses.