AWS Cloud Operations Blog

Analyzing AWS Control Tower Drift with Amazon Bedrock

Introduction

In order to enforce best practices for governance and compliance across AWS accounts in a centralized way, AWS Control Tower is an easy place to start.  However, ensuring continuous compliance requires regular drift detection and remediation, which Control Tower facilitates by providing a mechanism to detect drift and publish notifications to Amazon Simple Notification Service (SNS) topics.  AWS Control Tower Drift occurs when configuration changes in your AWS environment deviate from the baseline set by AWS Control Tower. Drift can occur due to manual changes outside of Control Tower, unintended configuration changes, or updates to AWS services and features.

Unmanaged configuration changes, or “drift”, can undermine the effectiveness of centralized governance frameworks like AWS Control Tower. Avoiding and quickly resolving drift helps ensure that your cloud environment remains compliant with organizational policies and industry regulations, thereby mitigating risks. When drift occurs, it can potentially disrupt Control Tower’s ability to manage your Organizational Units (OUs) and therefore the accounts used to apply controls, monitor compliance, and enforce governance throughout your organization. Being able to quickly analyze and remediate drift enhances your organization’s compliance operations and overall security posture.

Understanding the various types of drift, determining their root causes, and remediating the issues add additional operational overhead and consume time that could be dedicated to innovating or modernizing your infrastructure or applications.

In this blog post we will show you how Amazon Bedrock can be used to streamline the process.  Amazon Bedrock is a fully managed AI service that can help you analyze and understand the complex data associated with AWS Control Tower drift.  For example, Amazon Bedrock can automatically categorize the different types of drift detected, identify the root cause, and even suggest remediation steps. By leveraging Amazon Bedrock’s natural language processing and machine learning capabilities, you can quickly gain insights into your drift data and take appropriate action to maintain compliance without adding significant operational overhead.

In the next section, we’ll demonstrate how you can use Amazon Bedrock to analyze AWS Control Tower drift and outline the steps for remediation, helping ensure your AWS environment maintains consistent governance.

Overview of solution

This workflow diagram illustrates an AWS cloud native solution for detecting and responding to configuration drifts. Here’s an explanation of the process:

  1. The workflow starts in the Management Account, where Control Tower detects a configuration drift. Once detected, the process moves to the account designated for centralized logging (often referred to as the Audit account in a typical Control Tower setup, but this may vary based on your organization’s structure).
    1. AWS CloudTrail records the drift event as part of its management events. Data events can be configured in order to capture more detailed information from relevant resources.
    2. This CloudTrail event triggers an Amazon Simple Notification Service (SNS) Topic.
  2. The SNS Topic then activates and sends an email notification as part of AWS Control Tower.
  3. The Knowledge Base feeds information to an Amazon Bedrock Agent.

The Amazon Bedrock Agent then analyzes the drift information and provides insights into the root cause and potential impact of the detected drift. This setup allows for automated detection, notification, and analysis of configuration drifts in the AWS environment. It leverages various AWS services to create a comprehensive monitoring and response system, helping maintain desired configurations across the cloud infrastructure.

While this solution primarily focuses on drift detection and analysis, it lays the groundwork for potential future automation of remediation steps, pending careful consideration and implementation of appropriate safeguards.

Architecture diagram depicting solution flow

Figure 1: Architecture diagram showing Amazon Bedrock analyzing AWS Control Tower Drift

Using Amazon Bedrock Agents to Analyze and Resolve Drift

To demonstrate Amazon Bedrock’s capabilities, we can intentionally create drift in our AWS Control Tower environment. A common method is modifying a resource outside of AWS Control Tower guardrails.

When the drift notification is received, you can leverage Bedrock to gain deeper insights into the issue. Bedrock can analyze the drift notification and provide meaningful information about the root cause of the drift. For example, if the notification references drift due to ACCOUNT_MOVED_BETWEEN_OUS.  Amazon Bedrock can explain which Organizational Unit (OU) the account was moved to and from, and explains why the change caused the drift.

Key Benefits of Using Amazon Bedrock for Cloud Governance

  • Proactive Monitoring and Alerts: Amazon Bedrock continuously monitors your AWS environment, providing real-time alerts when drifts are detected, ensuring immediate attention and quick action.  To leverage Bedrock for proactive monitoring and alerting of AWS Control Tower drift, you would need to set up Bedrock agents to continuously ingest and analyze the drift notifications from your Control Tower SNS topics. This could be done as a scheduled job or triggered by new drift events.
  • Root Cause Analysis: By identifying the underlying causes of configuration drifts, Bedrock helps in preventing future occurrences, thereby enhancing the stability and governance of your cloud infrastructure.
  • Step by Step Remediation: While Bedrock can analyze the details of the detected drift and provide insights into the root cause, the process of actually remediating the drift would likely still require some manual effort. Bedrock does not currently have the ability to automatically remediate drift issues in AWS Control Tower.  However, Bedrock can provide valuable information to streamline the remediation process. By analyzing the drift notification data and correlating it with CloudTrail logs, Bedrock can identify the specific actions or configuration changes that led to the drift. This information can then be used to quickly determine the appropriate remediation steps.
  • Automated Remediation: Bedrock can help generate code that can be used to automatically remediate certain types of drifts, reducing the need for manual intervention and allowing your team to focus on more strategic tasks.

Prerequisites

Before getting started, ensure you have:

AWS Control Tower SNS Topic

Figure 2: AWS Control Tower SNS notifications

  • Permissions to read and process CloudTrail logs to analyze the root causes of drift. This includes the following IAM permissions:
    • Management events
    • Data events for specific resources relevant to Control Tower drift
    • Read access to CloudTrail Lake if implemented
    • Permissions to access and configure Amazon Bedrock, including creating agents and processing data.
    • Access to Anthropic Foundational Models
    • SNS permissions for subscribing to and publishing notifications
    • Amazon CloudWatch permissions for metric data retrieval

It’s crucial to implement least-privilege access principles, granting only the necessary permissions for the Bedrock agents to perform their required tasks.  For a comprehensive list of required permissions and best practices, please refer to the official AWS documentation:

Agent Creation Workflow

To get started with using Amazon Bedrock to analyze and remediate AWS Control Tower drift, we’ll need to first create custom agents within the Bedrock service. These agents will be configured with specific instructions and capabilities to handle different aspects of the drift analysis and remediation process.

Here’s the step-by-step process for creating the Bedrock agents:

Bedrock Agent : Analyze root cause

  1. Log in to the Amazon Bedrock console and select Agents in the left pane navigation.
  2. Select Create Agent.

Provide the following details:

  • Agent name: drift-root-cause-agent
  • Agent description:  This agent helps you analyze the root cause of drift and it will give you the steps for remediation.
  • Create New Role: Add the permissions to the role that were mentioned in the prerequisites.
  • Select model: Anthropic – Claude Versions may vary (For the example testing we used the model Claude 3.5 Sonnet).
  • Instructions for the Agent: Identify root cause of Control Tower drift and associated CloudTrail event.
  • Select Save and Exit, then Prepare.
Amazon Bedrock Agent configuration details.

Figure 3: Amazon Bedrock Agent Configuration Details.

Creating a Knowledge Base

An Amazon Bedrock Knowledge Base is a foundational repository of information and data used by AI models to understand and respond to queries. It serves as the core knowledge from which we draw insights, facts, and contextual understanding to provide accurate and relevant responses to users’ questions and requests.

For the agent to understand the queries, you should input relevant AWS Control Tower documentation into an Amazon Simple Storage Service (Amazon S3) bucket in the audit account. This documentation can be found in the official AWS Control Tower User Guide.

You can download relevant sections of this guide and upload them to your S3 bucket. Additionally, uploading a sample Control Tower Drift JSON file will allow you to run queries based on the instructions for the agent.

Make sure to follow AWS best practices for securing S3 buckets when storing this information.

Knowledge base for the agent to understand AWS Control Tower.

Figure 4: Attach the new Knowledge Base to your drift analysis agent.

Test the Agent

Prompt Engineering refers to the practice of optimizing textual input to a Large Language Model (LLM) to obtain desired responses. Below, you can see the prompts we used for the Bedrock Agent testing.

Prompts:

  • Example Prompt: “Tell me about an event of Control Tower Drift that has occurred in my managed account”
  • Example prompt: “Analyze drift for Account ID: xxxxxxxxxxxx in Organization ID: o-exampleorgid”
    • Remediation is given.
  • Example prompt: “Can you elaborate more on the remediation”
Prompting the bedrock agent to provide remedation steps

Figure 5: Example prompts to the Amazon Bedrock agent for remediation steps.

The agent should respond to the prompts and consistently provide steps to remediate the drift to the AWS Control Tower environment as shown in the example above.

Our Amazon Bedrock agent should return:

  1. Accurate identification of the drift event: The agent should correctly recognize and describe the specific Control Tower drift that occurred.
  2. Detailed root cause analysis: It should provide a clear explanation of why the drift happened, identifying the actions or changes that led to the deviation from the Control Tower baseline.
  3. Potential impact assessment: The agent should outline possible consequences of the drift on your AWS environment’s compliance and security posture.
  4. Actionable remediation steps: It should offer specific, step-by-step guidance on how to resolve the drift and bring the environment back into compliance with Control Tower standards.
  5. Relevant AWS documentation references: The agent should provide links to official AWS documentation for further information on the drift type and remediation processes.
  6. Consistency across different drift scenarios: We want the agent to maintain this level of detailed analysis and guidance for various types of Control Tower drift, not just for one specific example.

Clean up

To avoid incurring future charges and maintain a clean environment, follow these steps to delete the resources created during this exercise:

  1. Delete Bedrock agents and associated knowledge base.
  2. Delete any S3 buckets created to store documentation or drift data.
  3. Remove SNS topics and subscriptions that are no longer needed.
  4. Review and revoke any IAM roles or policies that are no longer needed.

Remember to review your AWS account for any other resources that might have been created during this project and remove them if they’re no longer needed. Always double-check before deleting resources to ensure you’re not removing anything critical to your operations.

Conclusion

Using Amazon Bedrock to analyze AWS Control Tower drift provides a powerful way to maintain compliance and governance in your AWS environment. By analyzing the root cause of the drift and remediation steps, you can ensure that your AWS resources remain aligned with best practices and ensures Control Tower can maintain governance over the Landing Zone.

For more information on Amazon Bedrock and AWS Control Tower, see AWS documentation and search for the appropriate service.

References

Next Steps

In this blog post, we’ve demonstrated how Amazon Bedrock can be used to analyze and gain insights into AWS Control Tower drift, helping you maintain compliance and security in your multi-account AWS environment.

To further enhance your drift detection and remediation capabilities, we recommend the following next steps:

  1. Explore more features of Amazon Bedrock: Investigate additional Bedrock capabilities, such as customizable data processing pipelines and advanced analytics, to tailor the solution to your specific needs.
  2. Implement continuous monitoring: Set up a robust Bedrock-powered monitoring system to proactively manage compliance in your AWS environment, ensuring you can quickly identify and resolve drift issues.
  3. Stay updated with AWS best practices: Regularly review AWS guidance and updates to optimize your cloud governance strategy and ensure your Control Tower and Bedrock configurations align with the latest recommended practices.

Ready to get started? Schedule a consultation with our AWS experts to learn how Amazon Bedrock can be integrated into your AWS Control Tower drift management process. Our team can help you design and implement a custom solution that meets your specific requirements and helps you maintain a secure, compliant, and well-governed cloud environment.

About the authors

Abraham Musa author photo

Abraham Musa

Abraham is a Cloud Operations Specialist Solutions Architect with the Cloud Foundations team at AWS based out of New York. He specializes in AWS Control Tower, AWS Organizations, AWS Service Catalog, and AWS Config. Abraham is a United States Army Veteran and enjoys traveling.

Andres Mejia author photo

Andres Mejia

Andres Mejia is a Federal Civilian Solutions Architect. He specializes in Cloud Operations. Andres has been a Solutions Architect for the last 2 years and enjoys being a trusted advisor for his federal customers. Outside of work, he spends time playing sports, cooking, and spending time with family and friends.

Craig Edwards author photo

Craig Edwards

Craig Edwards is a World Wide Technologist with the Critical Capabilities team at AWS based out of Boston Massachusetts. He specializes in AWS Config, AWS CloudTrail, AWS Audit Manager and AWS Systems Manager. Craig is a United States Air Force Veteran and when he is not building cloud solutions, he enjoys being a Father and electric vehicles.

Zukhra Salieva author photo

Zukhra Salieva

Zukhra is a DevOps consultant, she specializes in guiding customers through their cloud journey, optimizing and automating cloud operations. She brings expertise in managing code releases, deployments, and building robust cloud infrastructures. Outside of work, she dedicates time to serving her community and enjoys spending time with family.