AWS Cloud Operations Blog

Consolidate and query AWS CloudTrail data across accounts and regions using AWS CloudTrail Lake

AWS CloudTrail allows tracking of user and API activities across your AWS infrastructure. AWS CloudTrail best practices recommend AWS customers set up separate trails for different use cases such as operational troubleshooting, auditing, security monitoring, etc. Once the use case is accomplished, customers might permanently delete some of the trails but choose to retain their Amazon S3 buckets that store CloudTrail events to meet compliance or auditing requirements or for future deep dives.

Customers leverage AWS CloudTrail Lake to provide their teams a fully managed, central query mechanism for CloudTrail events across accounts and regions. AWS CloudTrail Lake starts recording events from the time you create an event data store with the option of importing existing logs. Customers can choose not to import historical CloudTrail data at creation but might later need it to perform investigations. Customers may also want to import a small subset of past data rather than all available event logs from the S3 bucket.

In this blog post, I demonstrate how to set up a centralized CloudTrail Lake that consolidates historical CloudTrail event logs with new CloudTrail events. This blog leverages CloudTrail Lake’s ability to directly import data from S3 buckets for desired time ranges and augment data already residing in the Lake. Once you have created a consolidated event data store in CloudTrail Lake, you can use it to run queries on all your logs, including events brought over from your S3 buckets.

Pre-requisites

  • Access to management account.
  • An existing Amazon S3 bucket that contains CloudTrail logs delivered from an AWS CloudTrail Trail.

Walkthrough

Step 1: Create a Multi Account – Multi Regions Event Data Store

a. Navigate to CloudTrail Console. Choose Lake in the left navigation pane of the CloudTrail console. On the Lake page, open Event data stores tab. Choose Create event data store. On the Configure event data store page, in General details, enter a name for the event data store (e.g. – import-existing-logs-lake). For the rest, keep defaults and select Next.

b. On Choose events page, under CloudTrail events check Enable for all accounts in my organization. Keep rest options as default, select Next. On Review and create page, select Create event data store.

Figure showcasing event data store configuration

c. The event data store status shows as Creation in progress which soon changes to Enabled. Choose the event data store you just created and note the Event data store ARN.

d. To generate sample CloudTrail events, I created and deleted S3 buckets across accounts and regions.

e. To query your event data store, navigate to CloudTrail Console, from left hand panel choose Lake, and then select Query. On right hand panel, choose Editor tab. The below query lists all the S3 buckets that I created. Be sure to replace ENTEREVENTDATASTOREID with actual Event Data store ID.

SELECT eventName, eventTime, recipientAccountId, awsRegion
FROM ENTEREVENTDATASTOREID
WHERE eventName='CreateBucket'
ORDER BY eventTime ASC

Figure shows query results from event data store

Please note that results contain S3 buckets created after creation of CloudTrail Lake.

Let’s import data for last 120 days from an existing S3 bucket.

Step 2: Import existing CloudTrail logs from S3 into CloudTrail Lake

Please refer to Considerations section in Working with CloudTrail Lake documentation before proceeding with this section. Also, please note for AWS CloudTrail Lake you pay for ingestion and storage together. For querying, you pay as you proceed. Find more details on AWS CloudTrail Lake pricing at AWS CloudTrail pricing page by navigating to Paid Tier on left hand panel and then choosing Lake tab.

CloudTrail Lake needs the right permissions to copy existing trail events from S3 bucket to the destination event data store. To fulfill this need, please follow the steps below:

a. Setup an IAM role with the below trust policy. Replace values for aws:SourceArn & aws:SourceAccount

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "cloudtrail.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceArn": "Enter your Event Data Store ARN here",
                    "aws:SourceAccount": "Enter Event Data Store account number"
                }
            }
        }
    ]
}

And provide below policy for the role. Replace values for Resource, aws:SourceArn, aws:SourceAccount

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AWSCloudTrailImportBucketAccess",
      "Effect": "Allow",
      "Action": ["s3:ListBucket", "s3:GetBucketAcl"],
      "Resource": [
        "Enter ARN of existing Trail S3 bucket"
      ],
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "Enter Event Data Store account number",
          "aws:SourceArn": "Enter your Event Data Store ARN here"
         }
       }
    },
    {
      "Sid": "AWSCloudTrailImportObjectAccess",
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": [
        "Enter ARN of existing Trail S3 bucket", 
        "Enter ARN of existing Trail S3 bucket/*" 
      ],
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "Enter Event Data Store account number",
          "aws:SourceArn": "Enter your Event Data Store ARN here"
         }
       }
    }
  ]
}

b. Update the bucket policy for the existing Trail S3 bucket. Replace values for Principal and Resource.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicRead",
            "Effect": "Allow",
            "Principal": {
                "AWS": "Enter the ARN of IAM role created in Step 2a"
            },
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketAcl",
                "s3:GetObject"
            ],
            "Resource": [
                "Enter ARN of existing Trail S3 bucket/*",
                "Enter ARN of existing Trail S3 bucket"
            ]
        }
    ]
}

c. Navigate to CloudTrail Console. From left hand panel, choose Lake. Select Event data stores tab and select the event data store that you had created earlier. From top right-hand corner, choose Actions drop down and select Copy trail events.

Figure shows Copy trail events drop down from Actions menu

d. In Copy trail events page, under Choose trail event source, choose Enter S3 URI. Under S3 URI, browse path to your S3 bucket. Under Specify a time range of events, provide a time range.

Figure shows Copy trail events configuration

e. Under Delivery location, choose event data store you created earlier and under Permissions choose the IAM role you created in Step 2a. Select Copy events.

f. On dialog box that pops up, check Copy trail events to Lake and then select Copy Events. To view the status of copy operation, navigate to Event copy status on your event data store details page. Upon successful copy of events, the copy status updates to Completed.

Step 3: Query historical and recent data across multiple regions and accounts.

Once copy is complete, consolidated event log is available to aggregate data across accounts, regions, and spot trends. For example, the below query aggregates the number of S3 buckets created per month across accounts and regions.

SELECT recipientAccountId as Account, awsRegion as Region, date_trunc('month', eventtime), COUNT(*) AS NumberofBucketsCreated
FROM ENTEREVENTDATASTOREID
WHERE eventName='CreateBucket'
GROUP BY recipientAccountId, awsRegion, date_trunc('month', eventtime)
ORDER BY count(*) DESC

Figure shows aggregated number of S3 buckets created across accounts and regions aggregated by month

If you now run the query provided in Step 1e, the results show all the S3 buckets created across accounts and regions. More sample queries for AWS CloudTrail Lake are available in the AWS CloudTrail Lake query samples documentation.

Conclusion

In this blog post, I showed you how to consolidate and query CloudTrail data across accounts and regions using AWS CloudTrail Lake for both historical Cloudtrail logs as well as current events. For additional information on AWS CloudTrail Lake, please visit Working with AWS CloudTrail Lake.

Author photograph - Pranjal Gururani

Pranjal Gururani

Pranjal Gururani is a Solutions Architect at AWS based out of Seattle. Pranjal works with various customers to architect cloud solutions that address their business challenges. He enjoys hiking, kayaking, skydiving, and spending time with his family during spare time.