AWS Compute Blog
Orchestrating a security incident response with AWS Step Functions
In this post I will show how to implement the callback pattern of an AWS Step Functions Standard Workflow. This is used to add a manual approval step into an automated security incident response framework. The framework could be extended to remediate automatically, according to the individual policy actions defined. For example, applying alternative actions, or restricting actions to specific ARNs.
The code examples shown in this post can be found in this GitHub repository.
The application uses Amazon EventBridge to trigger a Step Functions Standard Workflow on an IAM policy creation event. The workflow compares the policy action against a customizable list of restricted actions. It uses AWS Lambda and Step Functions to roll back the policy temporarily, then notify an administrator and wait for them to approve or deny.
Important: the application uses various AWS services, and there are costs associated with these services after the Free Tier usage. See the AWS pricing page for details.
Deploy this application using AWS Serverless Application Model (AWS SAM CLI). You then create a new IAM Policy to trigger the rule and run the application.
Deploying the application
Follow the instructions below in order to deploy from this GitHub repository:
- Create an AWS account if you do not already have one and login.
- Clone the repo onto your local development machine:
git clone https://github.com/aws-samples/automating-a-security-incident-with-step-functions.git
- In the root directory, from the command line, run:
sam deploy --guided
- Follow the prompts in the deploy process to set the applicaiton name, email address and restrictedActions:
Once the deployment process is completed, 21 new resources are created. This includes:
- Five Lambda functions that contain the business logic.
- An Amazon EventBridge rule.
- An Amazon SNS topic and subscription.
- An Amazon API Gateway REST API with two resources.
- An AWS Step Functions state machine
To receive Amazon SNS notifications as the application administrator, you must confirm the subscription to the SNS topic. To do this, choose the Confirm subscription link in the verification email that was sent to you when deploying the application.
EventBridge receives new events in the default event bus. Here, the event is compared with associated rules. Each rule has an event pattern defined, which acts as a filter to match inbound events to their corresponding rules. In this application, a matching event rule triggers an AWS Step Functions execution, passing in the event payload from the policy creation event.
Running the application
Trigger the application by creating a policy either via the AWS Management Console or with the AWS Command Line Interface.
Using the AWS CLI
First install and configure the AWS CLI, then run the following command:
aws iam create-policy --policy-name my-bad-policy1234 --policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:GetBucketObjectLockConfiguration",
"s3:DeleteObjectVersion",
"s3:DeleteBucket"
],
"Resource": "*"
}
]
}'
Using the AWS Management Console
- Go to Services > Identity Access Management (IAM) dashboard.
- Choose Create policy.
- Choose the JSON tab.
- Paste the following JSON:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:GetBucketObjectLockConfiguration", "s3:DeleteObjectVersion", "s3:DeleteBucket" ], "Resource": "*" } ] }
- Choose Review policy.
- In the Name field, enter my-bad-policy.
- Choose Create policy.
Either of these methods creates a policy with the permissions required to delete Amazon S3 buckets. Deleting an S3 bucket is one of the restricted actions set when the application is deployed:
This sends the event to EventBridge, which then triggers the Step Functions state machine. The Step Functions state machine holds each state object in the workflow. Some of the state objects use the Lambda functions created during deployment to process data.
Others use Amazon States Language (ASL) enabling the application to conditionally branch, wait, and transition to the next state. Using a state machine decouples the business logic from the compute functionality.
After triggering the application, go to the Step Functions dashboard and choose the newly created state machine. Choose the current running state machine from the executions table.
You see a visual representation of the current execution with the workflow is paused at the AskUser state.
These are the states in the workflow:
ModifyData
State Type: Pass
Re-structures the input data into an object that is passed throughout the workflow.
ValidatePolicy
State type: Task. Services: AWS Lambda
Invokes the ValidatePolicy Lambda function that checks the new policy document against the restricted actions.
ChooseAction
State type: Choice
Branches depending on input from ValidatePolicy step.
TempRemove
State type: Task. Service: AWS Lambda
Creates a new default version of the policy with only permissions for Amazon CloudWatch Logs and deletes the previously created policy version.
AskUser
State type: Choice
Sends an approval email to user via SNS, with the task token that initiates the callback pattern.
UsersChoice
State type: Choice
Branch based on the user action to approve or deny.
Denied
State type: Pass
Ends the execution with no further action.
Approved
State type: Task. Service: AWS Lambda
Restores the initial policy document by creating as a new version.
AllowWithNotification
State type: Task. Services: AWS Lambda
With no restricted actions detected, the user is still notified of change (via an email from SNS) before execution ends.
The callback pattern
An important feature of this application is the ability for an administrator to approve or deny a new policy. The Step Functions callback pattern makes this possible.
The callback pattern allows a workflow to pause during a task and wait for an external process to return a task token. The task token is generated when the task starts. When the AskUser function is invoked, it is passed a task token. The task token is published to the SNS topic along with the API resources for approval and denial. These API resources are created when the application is first deployed.
When the administrator clicks on the approve or deny links, it passes the token with the API request to the receiveUser Lambda function. This Lambda function uses the incoming task token to resume the AskUser state.
The lifecycle of the task token as it transitions through each service is shown below:
- To invoke this callback pattern, the askUser state definition is declared using the .waitForTaskToken identifier, with the task token passed into the Lambda function as a payload parameter:
"AskUser":{ "Type": "Task", "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken", "Parameters":{ "FunctionName": "${AskUser}", "Payload":{ "token.$":"$$.Task.Token" } }, "ResultPath":"$.taskresult", "Next": "usersChoice" },
- The askUser Lambda function can then access this token within the event object:
exports.handler = async (event,context) => { let approveLink = `process.env.APIAllowEndpoint?token=${JSON.stringify(event.token)}` let denyLink = `process.env.APIDenyEndpoint?token=${JSON.stringify(event.token)} //code continues
- The task token is published to an SNS topic along with the message text parameter:
let params = { TopicArn: process.env.Topic, Message: `A restricted Policy change has been detected Approve:${approveLink} Or Deny:${denyLink}` } let res = await sns.publish(params).promise() //code continues
- The administrator receives an email with two links, one to approve and one to deny. The task token is appended to these links as a request query string parameter named token:
- Using the Amazon API Gateway proxy integration, the task token is passed directly to the recieveUser Lambda function from the API resource, and accessible from within in the function code as part of the event’s queryStringParameter object:
exports.handler = async(event, context) => { //some code let taskToken = event.queryStringParameters.token //more code
- The token is then sent back to the askUser state via an API call from within the recieveUser Lambda function. This API call also defines the next course of action for the workflow to take.
//some code let params = { output: JSON.stringify({"action":NextAction}), taskToken: taskTokenClean } let res = await stepfunctions.sendTaskSuccess(params).promise() //code continues
Each Step Functions execution can last for up to a year, allowing for long wait periods for the administrator to take action. There is no extra cost for a longer wait time as you pay for the number of state transitions, and not for the idle wait time.
Conclusion
Using EventBridge to route IAM policy creation events directly to AWS Step Functions reduces the need for unnecessary communication layers. It helps promote good use of compute resources, ensuring Lambda is used to transform data, and not transport or orchestrate.
Using Step Functions to invoke services sequentially has two important benefits for this application. First, you can identify the use of restricted policies quickly and automatically. Also, these policies can be removed and held in a ‘pending’ state until approved.
Step Functions Standard Workflow’s callback pattern can create a robust orchestration layer that allows administrators to review each change before approving or denying.
For more information on other Step Functions patterns, see our documentation on integration patterns.