AWS Open Source Blog
Compliance as code and auto-remediation with Cloud Custodian
Many organizations identify governance and compliance as challenges, and a lack of visibility into cloud infrastructure as a prevalent problem. Companies spend thousands of hours a year maintaining compliance. Automating compliance monitoring and response not only reduces the burden of maintenance, but also increases the visibility across cloud environments. With the increasing cost and human effort to keep up with the compliance, validating and enforcing nearly continuous compliance and auto-remediation will increase the overall security posture and reduce the compliance cost.
We know that implementing infrastructure as code with AWS Cloud Development Kit (AWS CDK) makes it possible to realize Policy-as-Code across AWS resources via Open Policy Agent. The approach is more about the “preventive” control across AWS resources when considering business and governance requirements. In this post, we will discuss how to enable “detective” and “responsive” controls to enforce nearly continuous compliance.
Cloud Custodian is an open source, stateless rules engine that manages AWS environments. It consolidates many of the compliance scripts organizations use into a lightweight and flexible tool. With Cloud Custodian, we can easily set rules that validate and enforce the environment against security and compliance standards.
AWS Lambda provides powerful, real-time, and event-driven code execution. It responds to AWS resources’ behaviors. Cloud Custodian offers policy-level execution against multiple kinds of event streams, including Amazon CloudWatch Events, AWS CloudTrail events, and more. Each Cloud Custodian policy can deploy as an independent Lambda function.
With policy that runs in AWS Lambda, Cloud Custodian enforces compliance as code and auto-remediation, enabling organizations to simultaneously move fast and stay secure. Having the real-time visibility into who made what changes from where, enables us to detect misconfigurations and non-compliance. We can respond quickly to prevent risks from materializing.
The following steps demonstrate how to enable nearly continuous compliance with Cloud Custodian and AWS Lambda:
- Set up AWS resources for testing.
- Write Cloud Custodian policies.
- Validate and enforce the policies.
Prerequisites
Enough AWS knowledge to interact with the AWS Management Console and to spin up Amazon Elastic Compute Cloud (Amazon EC2) instances makes the following steps more manageable.
Creating an EC2 environment in AWS Cloud9
We use the AWS Cloud9 environment for the rest of the post. Follow these instructions to create an Amazon Linux Cloud9 EC2 environment as the workspace.
Note: Cloud Custodian policy execution may alter AWS resources. For the purpose of this blog post and learning, do not try this in production. Use a test or sandbox account.
To begin, install Cloud Custodian:
Getting started
Set up AWS resources for testing.
Create an Amazon EC2 instance
Launch an Amazon EC2 instance (we can use a t2.micro, for example) and create the tag Custodian-Testing.
Any value works and we can create the tag either during EC2 creation or add it afterwards. The Amazon EC2 instance validates on the existence of tag Custodian-Testing by one of the Cloud Custodian policies following.
Confirm that the created EC2 instance appears and that its tag is Custodian-Testing.
Create an AWS IAM policy
When Cloud Custodian policy is running, necessary Lambda functions are automatically created. We must specify Lambda roles with permissions for operations on the AWS resources in the policies. Based on the Cloud Custodian policies, we must create IAM policy with the following permissions.
Note: Replace the variables (region, account_id, ec2_id) with your region, account ID, and the EC2 instance ID. The ec2-tag-compliance-mark policy marks and stops the previously created EC2 instance.
Create an AWS IAM role
Create an IAM role and attach the preceding policy, AWSLambdaBasicExecutionRole, and AWSConfigRulesExecutionRole policies for Lambda and AWS Config rule execution.
Write Cloud Custodian policies
Cloud Custodian policies are YAML files, making it straightforward to write as it’s in human-readable format.
The policies usually include the following:
- The type of resource to run the policy against
- Filters to narrow down the set of resources
- Actions to take on the filtered set of resources
To find out more, check out Cloud Custodian documentation.
To begin, log in to the AWS Cloud9 terminal and use the IDE in the following steps.
Set up the environment variable with the ARN of the role created preceding in the following format:
arn:aws:iam::xxx:role/xxx
Remember to replace the variable ${Custodian_Lambda_Role_Arn_Value }
according to your environment:
export Custodian_Lambda_Role_Arn=${Custodian_Lambda_Role_Arn_Value}
Generate the policy file with variable Custodian_Lambda_Role_Arn
The three policies in this example fulfill the following tasks:
- Find all running EC2 instances that are using invalid AMIs and stop them. A Lambda function will be created through the policy execution; it would be invoked by CloudWatch scheduled events.
- Filter any security group that allows
0.0.0.0/0
or::/0
input onport 22
and remove the rule. A Lambda function will be created through the policy execution; it would be invoked by CloudTrail events. - Find all non-compliant tagged EC2 instances to stop in one day. This creates a Lambda function and an AWS Config rule.
Make sure you know the effects before uncommenting action sections of the preceding policies.
Validate and enforce the Cloud Custodian policy
We must validate Cloud Custodian policies against the JSON schema before processing.
$ custodian validate custodian_polices.yml
DryRun Cloud Custodian policy
Performming a dry-run command before running the command on infrastructure is usually preferred.
$ custodian run --dryrun custodian_polices.yml -s out
The preceding command created several files in the current directory specified via --output-dir
. Each policy provides metrics for Resource Count, Resource Time, and Action Time.
$ less out/ec2-tag-compliance-mark/metadata.json
The following image shows the metrics:
Next, we use the report subcommand to summarize and specify the results of the ec2-invalid-ami policy:
Run Cloud Custodian policy
Everything looks as expected, so now we are going to run the policies:
$ custodian run custodian_polices.yml -s out
The policies execution creates Lambda functions and an AWS Config rule, if they do not already exist. Otherwise, they update accordingly.
Lambda functions:
AWS Config rule:
From the AWS Config rule dashboard, there is one non-compliant resource shown.
Check the previously created the EC2 instance. We should see that the ec2-tag-compliance-mark
policy execution added the tag named maid_status
. The tag marked the instance to stop one day later.
Additionally, each Lambda function deployed through Cloud Custodian creates a CloudWatch log group named after the rule. We can refer to the logs when troubleshooting.
The following logs show the results of the ec2-tag-compliance-mark
policy execution:
We can see the AWS Lambda function validated the defined policies and it conducted auto-remediation.
Clean up
Delete the AWS Cloud9 EC2 environment from the console. Remove the EC2 instance, Lambda functions, and AWS Config rule we created by running Cloud Custodian policies.
Wrap up
This example demonstrates implementing compliance as code with Cloud Custodian and how to use AWS Lambda to complete auto-remediation. Cloud Custodian enables us to define rules and remediation efforts with AWS Lambda as one policy to facilitate a well-managed cloud infrastructure. Every organization has a set of policies to follow for detecting violations and taking remediation actions on their AWS resources.
By using Cloud Custodian and AWS Lambda to enforce compliance as code and auto-remediation, we are able to:
- Easily construct millions of policies, from simple queries to complex workflows, using the easy-to-read DSL to fulfill remediation automatically.
- Get governance as a core capability via a YAML DSL rules engine that integrates with serverless for real-time reaction.
- Achieve nearly continuous compliance by actively enforcing the policies and conforming to internal best practices and guidelines.
DevOps processes can incorporate automated security testing and compliance, bringing us much closer to DevSecOps. Cloud Custodian solves for the challenges of security enforcement, tagging, unused or invalid resources cleanup, account maintenance, cost control, and backups.
Let your imagination run wild and use these tools to get more visibility and control over your entire AWS environment.