AWS Cloud Operations Blog

Analyzing Amazon CloudWatch Internet Monitor measurement logs using Amazon Athena and Amazon QuickSight

Overview

In this blog post, we describe how to use Amazon Athena with Amazon CloudWatch Internet Monitor’s measurement information stored on Amazon S3. You can use Athena to analyze event logs, and then build a dashboard to visualize the measurement logs and gain insights by using Amazon QuickSight.

Internet Monitor publishes measurements to CloudWatch Logs for the city-networks (locations and ASNs) that are specific to your application, for the top 500 city-networks (by traffic volume). These measurements – the performance score, availability score, monitored bytes transferred (bytes in and bytes out), and round-trip time – provide visibility into how internet issues impact the performance and availability between your applications hosted on AWS and your end users. You can also choose to publish the measurements for all your monitored city-networks to Amazon S3, and then use different AWS analytics services to view and analyze the information.

After you create a monitor in Internet Monitor, you can use the dashboard in Amazon CloudWatch to filter and view information for the top 500 city-networks to visualize and get insights about health events, performance, and more. When you opt to publish all measurements to S3, and then use analytics services to build a dashboard with the information, you have the flexibility to do deep dives with all of your Internet Monitor measurements, and analyze and visualize results for insights. Using services such as Athena and QuickSight, you can add to and enhance your analysis and visualizations to learn even more about your application’s performance, availability, and activity.

Diagram illustrating Internet Monitor’s access from the CloudWatch dashboard, publishing to S3, and integration with analytics services.

Figure 1. Diagram illustrating CloudWatch Internet Monitor’s access from the CloudWatch dashboard, publishing to S3, and integration with analytics services.

Before you start

The steps here assume that you’ve already created a monitor in Internet Monitor in your AWS account. If you don’t have a monitor, follow the instructions in the Internet Monitor chapter of the Amazon CloudWatch User Guide to create one, by using the steps in the console or the AWS CLI. Make sure that you set up the monitor to publish measurements to Amazon S3 when you create it, or edit your monitor to turn on the feature.

Analyze measurement events using Amazon Athena and Amazon QuickSight

If you choose to use Amazon QuickSight to create a dashboard to visualize your Internet Monitor measurement events, the first step is to pull the measurement events from S3 into a table in Amazon Athena. Then, you can create standard SQL queries for the specific measurements that you want to get insights about, for example, specific client locations and ASNs, or only the client locations affected by specific health events.

Create an external table in Athena

Get started by creating a table in Athena for your Internet Monitor measurement events. Then, create sample queries to pull details about groups of events information that you’d like to view in a dashboard, with QuickSight.

To use Athena to create a table of events data for your Internet Monitor measurements:

  1. Open the Athena console at https://console.thinkwithwp.com/athena/
  2. In the Athena query editor, enter the following command, and then choose Run Query. For the LOCATION parameter, enter your own information for the S3 bucket name, prefix, account, and Region.
CREATE EXTERNAL TABLE internet_measurements (
    version INT,
    timestamp INT,
    clientlocation STRING,
    servicelocation STRING,
    percentageoftotaltraffic DOUBLE,
    bytesin INT,
    bytesout INT,
    clientconnectioncount INT,
    internethealth STRING,
    trafficinsights STRING
)
PARTITIONED BY (year STRING, month STRING, day STRING)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://ExampleBucketName/ExamplePrefix/AWSLogs/YourAWSAccountID/internetmonitor/YourRegion/'
TBLPROPERTIES ('skip.header.line.count' = '1');

The command creates a table of events data called internet_measurements.

Querying internet measurements events data in Athena

With the table of events data, you can use standard SQL to build custom queries in Athena to return a variety of information about your application’s internet measurements events. To learn details about using SQL with Athena, see Querying Data in Amazon Athena Tables. The example Athena queries provided here can give you an idea about what’s possible and help you get started. For example, you might start by querying to return the top client locations and ASNs (internet service providers or ISPs) that access your application as a percentage of your total application traffic.

To return this information, you enter a new query in the Athena query editor, as shown in the following screenshot.

Example query of the internet measurements table in the query editor in Athena.

Figure 2. Example query of the internet measurements table in the query editor in Athena.

The following are example queries that you might find helpful to use with your Internet Monitor measurements data.

Example 1: Top 10 (by percentage of total traffic) client location cities and ASNs impacted by a health event

SELECT json_extract_scalar(clientLocation, '$.city') as city,
    json_extract_scalar(clientLocation, '$.networkname') as networkName,
    sum(percentageoftotaltraffic) as percentageoftotaltraffic
FROM internet_measurements
GROUP BY json_extract_scalar(clientLocation, '$.city'),
    json_extract_scalar(clientLocation, '$.networkname')
ORDER BY percentageoftotaltraffic desc
limit 10

Example 2: Top 10 locations and ASNs impacted by an availability impact

SELECT json_extract_scalar(clientLocation, '$.city') as city,
    json_extract_scalar(clientLocation, '$.networkname') as networkName,
    sum(
        cast(
            json_extract_scalar(
                internetHealth,
                '$.availability.percentageoftotaltrafficimpacted'
            )
        as double ) 
    ) as percentageOfTotalTrafficImpacted
FROM internet_measurements
GROUP BY json_extract_scalar(clientLocation, '$.city'),
    json_extract_scalar(clientLocation, '$.networkname')
ORDER BY percentageOfTotalTrafficImpacted desc
limit 10

Example 3: Top 100 (by total byes transferred) client locations impacted

SELECT json_extract_scalar(clientLocation, '$.city') as city,
    json_extract_scalar(clientLocation, '$.subdivision') as subdivision,
    json_extract_scalar(clientLocation, '$.country') as country,
    avg(cast(json_extract_scalar(internetHealth, '$.availability.experiencescore') as double)) as availabilityScore,
    avg(cast(json_extract_scalar(internetHealth, '$.performance.experiencescore') as double)) performanceScore,
    avg(cast(json_extract_scalar(trafficinsights, '$.timetofirstbyte.currentexperience.value') as double)) as averageTTFB,
    sum(bytesIn) as bytesIn,
    sum(bytesOut) as bytesOut,
    sum(bytesIn + bytesOut) as totalBytes
FROM internet_measurements
GROUP BY 
json_extract_scalar(clientLocation, '$.city'),
    json_extract_scalar(clientLocation, '$.subdivision'),
    json_extract_scalar(clientLocation, '$.country')
ORDER BY totalBytes desc
limit 100

When you create a query in Athena, keep the following in mind:

  • To extract data from nested JSON fields, use one of the Presto functions, json_extract or json_extract_scalar. For more information, see Extracting data from JSON.
  • Enter all characters for JSON fields in lower case.

You can save queries in Athena if you want to be able to run the same queries at a later time. You can also download the results of your queries in CSV format, to view or analyze the information in a spreadsheet program, for example. For more information, see Downloading query results files using the Athena console.

Visualize measurement events in Amazon QuickSight

After you set up an events table in Athena, you can use Amazon QuickSight to visualize data in an Athena table with just a few clicks. In this section, we’ll step through how to use QuickSight to work with data that’s located in an Athena table.

Before you connect QuickSight to Athena, make sure to grant QuickSight access to both Athena and the associated S3 buckets in your account that have the measurement events files from Internet Monitor. You can follow the steps to configure access in the QuickSight User Guide.

To set up visualizations for your data in QuickSight, follow these steps:

The data sources available in Athena.

Figure 3: The data sources available in Athena.

1.Log in to QuickSight, and then choose Datasets.

2.Choose New dataset, and then choose Athena as a new data source.

The custom SQL query editor in Amazon QuickSight.

Figure 4: The custom SQL query editor in Amazon QuickSight.

3.Enter a name for the new dataset, and then choose Validate connection.

4.Choose Create data source.

5.Select Use custom SQL, and then enter a name for the SQL query.

6.Enter a query that returns information that you want to visualize for your measurement events, as shown in the screenshot.

For example, you can enter the following query:

SELECT json_extract_scalar(clientLocation, '$.city') as city,
json_extract_scalar(clientLocation, '$.country') as country,
json_extract_scalar(clientLocation, '$.subdivision') as state,
    json_extract_scalar(clientLocation, '$.networkname') as networkName,
    sum(percentageoftotaltraffic) as percentageoftotaltraffic
FROM internet_measurements
GROUP BY json_extract_scalar(clientLocation, '$.city'),
json_extract_scalar(clientLocation, '$.country'),
json_extract_scalar(clientLocation, '$.subdivision') ,
    json_extract_scalar(clientLocation, '$.networkname')
ORDER BY percentageoftotaltraffic desc

7.Choose Confirm query.

8.Import to SPICE for quicker analytics

Now that you have imported your data into your analysis, you can apply a visualization. A visualization is a type of chart, which you can customize based on what will help you understand and analyze the data.

To create a visualization in QuickSight:

  1. In Amazon QuickSight, choose New analysis.
  2. Select the dataset that you created, and then choose Use in analysis.
  3. At the bottom left of the screen, choose Line Chart.
  4. At the bottom left of the screen, choose Horizontal bar chart.
  5. Drag and drop the country field to the Y-Axis
  6. Drag and drop the percentageoftotaltraffic field to the Value.

A graph similar to the following should be displayed.

An example graph that shows application traffic for the top countries for an application, sorted by top traffic.

Figure 5: An example graph that shows application traffic for the top countries for an application, sorted by top traffic.

You can also map the percentage of total traffic by client location. In QuickSight, for Visualization type, choose Points on map, and then, under Field wells, for the Geospatial data point to visualize, choose city. The following figure shows a screenshot of this area of QuickSights.

Example of a map visualization showing top cities by total percentage of traffic

Figure 6: Example of a map visualization showing top cities by total percentage of traffic

Summary

In this post, we described how you can analyze and visualize Amazon CloudWatch Internet Monitor internet measurements events data by using Amazon Athena and Amazon QuickSight. With these services, you can publish, store, analyze, and visualize your internet measurements events data flexibly and efficiently. You can use Athena to query events data stored in Amazon S3 by using standard SQL. You can use Amazon QuickSight to create interactive dashboards and charts to explore your data. With these tools, you can leverage the Internet Monitor measurements data you store in S3 to gain more insights about your application’s performance and usage.

About the authors

Xulong Gao

As a Senior Solution Architect for Key Accounts at Amazon Web Services, Xulong has played a pivotal role in the global launches of numerous esteemed game studios. His expertise spans Game Tech, Global Network Optimization, Application Delivery, Network and Application Security, and Content Delivery Networks (CDN).

Abdul Kader Maliyakkal

Abdul is a Senior Software Development Engineer at Amazon Web Services. He is passionate about building and scaling systems, and creating tools for developers.

Richi Kumari

Richi is a Senior Product Manager at Amazon Web Services and contributes to the design of product strategies and requirements that effectively meet the needs of customers.