AWS Storage Blog
Automating AWS Backup pre- and post-script execution with AWS Step Functions
Customers execute custom scripts before or after a backup job to automate and orchestrate required and repetitive tasks. For example, customers running applications hosted in Amazon Elastic Compute Cloud (EC2) instances use scripts to complete application transactions, flush the buffers and caches, stop file I/O operations, or ensure that the application is idle, bringing the application data to a consistent state prior to taking a backup. Manually coordinating these scripts is complex and time-consuming with steep operational overhead, especially for customers with a large fleet applications running on EC2 instances.
In this post, the first of a two-post series, we walk through building an orchestration workflow for data protection with AWS Backup, Amazon EC2, AWS Step Functions, AWS Lambda, and AWS Systems Manager. We will show how to orchestrate pre- and post-script execution in Linux or Windows-based EC2 instances by leveraging AWS Systems Manager’s run command feature. The orchestration through AWS Step Functions also provides the ability to start or stop an EC2 instance during the backup job using APIs. In the second post, we will step through the execution paths and how to troubleshoot any failures.
Solution overview
Step Functions operate as the orchestration layer in this solution. It’s responsible for leveraging multiple AWS serverless services to achieve the desired output. The state machine built into this solution performs various tasks based on the customized user needs. At a high level, the state machine processes an input passed by the user where they specify the parameters used for the job execution.
Based on the input parameters, the workflow goes through multiple steps:
- It integrates with Systems Manager to run the defined scripts.
- It calls Amazon EC2 APIs to stop/start the EC2 instance and calls the AWS Backup APIs to initiate an on-demand backup
- Finally, once the orchestration is complete, the entire log of that particular execution is stored in an Amazon DynamoDB table for auditing.
Figure 1: High-Level Architecture
Prerequisites
To implement this solution on your end, you should meet the following prerequisites:
- An AWS Account.
- Running EC2 instances in the Region you choose to deploy the solution. The instances need an EC2 instance profile with the AmazonSSMManagedInstanceCore and AmazonS3ReadOnlyAccess managed policies attached to the Amazon EC2.
- An Amazon Simple Storage Service (Amazon S3) bucket.
- High-level understanding of AWS Serverless Application Model (AWS SAM).
- EC2 instances with a specific tag to identify them as part of the workflow.
- The code from aws-samples to deploy the solution.
Environment setup
To simplify the deployment of this solution in your environment, we use AWS SAM, which lets you define and model an application using YAML. During deployment, AWS SAM transforms and expands the AWS SAM syntax into AWS CloudFormation syntax using CloudFormation macros managed by AWS. To learn more about macros, refer to this blog.
Also, in this walkthrough, we use the AWS Cloud9 IDE, which provides a seamless experience for developing serverless applications. It has a preconfigured development environment that includes AWS Command Line Interface (AWS CLI), AWS SAM CLI, AWS SDKs, code libraries, and many useful plugins. Follow this guide to set up AWS Cloud9 in your AWS environment.
Step 1: Create an S3 Bucket
In your Cloud9 built-in terminal, run the following command to create an S3 bucket or re-use an existing bucket to store the scripts used by the solution. Make sure to replace BUCKET_NAME with the name you would like to use.
aws s3api create-bucket –bucket BUCKET_NAME
Step 2: Upload the scripts to the S3 Bucket
In this step, we create a couple of simple shell scripts to demonstrate the script execution functionality, and then we store them in the S3 bucket created in the previous step.
Run the following BASH commands in your Cloud9 built-in terminal:
# create a pre-script
echo "#!/bin/bash" > pre-script.sh
echo "" >> pre-script.sh
echo "echo \"Executing the pre-script\"" >> pre-script.sh
echo "date" >> pre-script.sh
echo "echo \"Completed executing the pre-script\"" >> pre-script.sh
# put the script in the bucket
aws s3api put-object --bucket BUCKET_NAME --key scripts/pre-script.sh --body pre-script.sh
# create a post-script
echo "#!/bin/bash" > post-script.sh
echo "" >> post-script.sh
echo "echo \"Executing the post-script\"" >> post-script.sh
echo "date" >> post-script.sh
echo "echo \"Completed executing the post-script\"" >> post-script.sh
# put the script in the bucket
aws s3api put-object --bucket BUCKET_NAME --key scripts/post-script.sh --body post-script.sh
Step 3: Validate tags on EC2 instances.
The Step Functions state machine uses tags to identify the target EC2 instances that are part of the automation workflow. In this example, we use the following JSON tag:
"Key: "backup-job-pre-post-script",
"Value": "true"
Ensure that the EC2 instances you want to make part of your backup orchestration workflow have the proper tag you wish to use.
Step 4: Validate or install the Systems Manager agent
The Systems Manager Agent is Amazon software that runs on EC2 instances, edge devices, and on-premises servers and virtual machines (VMs). In this solution, the Systems Manager agent lets us retrieve the scripts from the S3 bucket and manage their execution.
To validate the Systems Manager agent status and start the agent, refer to the following link.
If you don’t have the Systems Manager agent installed in your EC2 instance, see Install SSM Agent for a hybrid environment (Linux) or Install SSM Agent for Windows Server.
Step 5: Validate the EC2 instance role
As stated in the prerequisites, the EC2 instance requires an IAM role that includes the AmazonSSMManagedInstanceCore and AmazonS3ReadOnlyAccess managed policies so that the Systems Manager agent can connect and retrieve the scripts from the S3 bucket.
For more information on how to use an instance profile to pass an IAM role to an EC2 instance, see Using instance profiles in Amazon EC2.
Step 6: Code download
Download the code from the aws-samples Git repository using the following command:
git clone https://github.com/aws-samples/aws-backup-prepost-script-sample.git
Step 7: Deploy the solution using AWS SAM
Run the following command to begin the code deployment using AWS SAM.
cd aws-backup-pre post-script-sample
sam deploy –guided
Use the following arguments when prompted during the build process:
Figure 1.2: Guided configuration for SAM Deploy
Running the solution
The following sections cover the components that start the backup automation workflow, which can run on-demand or entirely automated based on a defined frequency using Amazon EventBridge rules.
Amazon EventBridge is a powerful service for building loosely coupled event-driven architectures. We leverage its capabilities in this solution to demonstrate how you can trigger the workflow reactively and proactively. For example, you can have security guardrails around your EC2 instance configurations by creating an AWS Config Rule. Any violation of this rule can automatically trigger this workflow to initiate backups. You can even start this solution in reaction to AWS CloudTrail insights events when any unusual activity arises in your AWS CloudTrail management or data events. Furthermore, you can integrate this into your deployment processes to take a backup before a major release. Having the flexibility of combining the workflow with EventBridge through scheduled or custom events opens up various possibilities in an event-driven architecture.
The solution creates a custom event bus in the deployed region where ad-hoc events are triggered using rules in EventBridge. As shown in the architecture diagram (Figure 1), the Step Functions state machine, which orchestrates the backup workflow, can be triggered in three ways.
1. Scheduled invocation
The workflow execution kicks off from a schedule where a pre-configured input invokes the state machine. The scheduled rule (with a Cron Expression) uses the ‘default’ event bus.
- The code to create the example scheduled rule is available in the sample code provided. This method runs your workflows on a predefined schedule that can coordinate with your scheduled maintenance.
- Refer here to enable the scheduled rule.
2. Custom invocation
The state machine can be triggered by publishing a custom event into the custom event bus. Use this rule to serve any need for ad-hoc invocation of the workflow, such as anticipating downtime, any incident within the account, or other automation that are in place which must trigger a backup. You can publish the event to the custom event bus via AWS CLI, AWS SDK, or the AWS Management Console. Therefore, this method helps integrate the workflow into your existing architectures. Once you run the AWS SAM deploy to create the custom event rule mentioned above, you can use the following steps to test the custom invocation path of the solution quickly in the console.
- Open Amazon EventBridge in the AWS Management Console.
- Select Event buses from the left-side menu as shown in Figure 2.1.
- Select Send events in the top-right corner, as shown in Figure 2.1.
Figure 2.1: Amazon EventBridge console
4. Provide the details for the custom event as shown in Figure 2.2.
-
- Event source: This can be anything. For example, you can provide a value appropriate for the use case so that Backup administrators can use it for auditing.
- Example: aws-config-rule-non-compliance-trigger
- Detail type: This must be ‘BackupSolution-PrePostScript’.
- Note that there might be other events that other AWS services can send to this custom event bus. To filter out the suitable event that should trigger the workflow, we used the ‘detail-type’ value of BackupSolution-PrePostScript. You can use a different string to filter this event by modifying the detail-type value here and run AWS SAM deploy to deploy this new change.
- Event detail: The input JSON used to trigger the State Machine. Use the example event in Step 7 of the README.md in the git repository.
- Event source: This can be anything. For example, you can provide a value appropriate for the use case so that Backup administrators can use it for auditing.
Figure 2.2: Custom event entry
3. Manually invoking state machine
You can directly invoke the state machine from the console by passing your custom input. An example input is provided in Step 7 of the README.md. Follow these steps to run the state machine workflow from the AWS Management Console manually.
- Navigate to the CloudFormation console and select the stack you created in Step 6 in the README.md.
- Switch to the Resources section of the selected stack and select the State Machine ARN (Figure 3.1).
Figure 3.1: AWS CloudFormation console
3. Select Start execution (Figure 3.2).
Figure 3.2 Step Functions console
4. Provide the input you want to pass to the State Machine execution (Figure 3.3), update the indicated paths in the figure with the appropriate values, and select start execution. You can find an example of the input in Step 7 of the README.md.
Figure 3.3: Diagram showing state machine invocation from console
5. Once the workflow execution completes, you can check the execution path followed for that particular execution (Figure 3.4). Note that, for this specific input provided, the route followed is:
-
- Execute Pre Script -> Stop EC2 Instance -> Run Backup Job -> Execute Post Script -> Log Execution details in DynamoDB table.
- Refer to the properties section for combinations you can try based on your use case.
Figure 3.4: Graph view of state machine execution
6. Verify the execution in the DynamoDB table. Go to the resources tab of your CloudFormation Stack (Figure 3.1), scroll to the bottom, and select the PhysicalID of the TransactionTable. This will navigate you to the Dynamo DB console.
7. Select explore tables (Figure 3.5) and verify the entries. An entry for your state machine execution should be available here.
Figure 3.5 Amazon DynamoDB console
The state machine can also be invoked through the AWS CLI or SDK.
Cleaning up
To avoid incurring future charges, delete the resources created by this solution.
- Execute
sam delete
in the project folder to delete all resources. - The S3 buckets, Backup Vault, and the DynamoDB table will not be deleted even after the stack goes through the deletion process using
sam delete
. These resources will be retained for Amazon EC2 backups and to keep historical state machine execution data. To not incur costs for these resources, you must manually delete them.- Delete the S3 bucket that stores the script and the run command logs. To delete the s3 bucket, you must empty the contents of the S3 bucket first. Refer to Deleting a bucket.
- Delete the DynamoDB table that stores the execution logs. Refer to Deleting a DynamoDB Table.
- Delete the Backup Vault that holds the EC2 instance backup data. To delete the backup vault, all recovery points must be deleted first before deleting the backup vault. Refer to Deleting a backup vault.
Conclusion
In this first post, we demonstrated how to leverage an event-driven architecture in combination with the serverless capabilities that AWS services provide to orchestrate your Amazon EC2 backups using AWS Backup. You can leverage this solution to build processes that are specific to your organization, integrating with your existing backup strategies as a starting point for feature-rich backup orchestration workflows. Moreover, you can utilize similar orchestration approaches to build backup workflows for other AWS Backup supported resources, such as DynamoDB, Amazon Relational Database Service (Amazon RDS), are more.
In the second post, we will cover the troubleshooting aspects of this solution. For additional information, please refer to AWS X-Ray, and AWS Step Functions.
Thanks for reading this blog post. To learn more about AWS Backup, visit the AWS Backup Developer’s Guide. If you have any questions or comments, please leave them in the comment section.