AWS Developer Tools Blog
Orchestrating an application process with AWS Batch using AWS CDK
In many real work applications, you can use custom Docker images with AWS Batch and AWS Cloud Development Kit(CDK) to execute complex jobs efficiently. AWS CDK is an open source software development framework to model and provision your cloud application resources using familiar programming languages, including TypeScript, JavaScript, Python, C# and Java. For the solution in this blog, we will use C# for the infrastructure code. This is a sequel to an earlier published blog with same infrastructure and concept. In this blog, we will leverage the capabilities and features of AWS CDK (with Microsoft .NET using C#) instead of CloudFormation. Let’s get started!
Overview
This post provides a file processing implementation using Docker images and Amazon S3, AWS Lambda, Amazon DynamoDB, and AWS Batch. In this scenario, the user uploads a CSV file to into an Amazon S3 bucket, which is processed by AWS Batch as a job. These jobs can be packaged as Docker containers and are executed using Amazon EC2 and Amazon ECS. The following steps provide an overview of this implementation:
- AWS CDK bootstraps and deploys a CloudFormation template.
- The infrastructure launches the S3 bucket that stores the CSV files. Other AWS services required for this application orchestration are also spun up.
- The Amazon S3 file event notification executes an AWS Lambda function that starts an AWS Batch job.
- AWS Batch executes the job as a Docker container.
- A Python-based program reads the contents of the S3 bucket, parses each row, and updates an Amazon DynamoDB table.
- Amazon DynamoDB stores each processed row from the CSV.
Prerequisites
- Make sure to have Docker installed and running on your machine. For instructions, see Docker Desktop and Desktop Enterprise
- Set up your AWS CLI. For steps, see Getting Started (AWS CLI)
Solution Steps
The following steps outline this walk-through. Detailed steps are given throughout the rest of this post.
- Download the code and run the Microsoft Dotnet build and AWS CDK deployment commands (commands provided below) to create the necessary infrastructure.
- Drop the CSV into the S3 bucket (provided a sample Sample.csv that will be parsed by the Batch Operation).
- Confirm that the job runs and performs the operation based on the pushed container image. The job parses the CSV file and adds each row into DynamoDB.
Once the code is downloaded, please take a moment to see how CDK provides a simpler implementation for spinning up an infrastructure using C# code. You may use Visual Studio Code or your favorite choice of IDE to open the folder (aws-netcore-app-using-batch). The provided source code consists of the following major components. Note the corresponding downloaded path for these components provided below.
- Microsoft .NET AWS CDK to build the infrastructure (aws-netcore-app-using-batch/code/src/MyApp)
- Python Lambda code as part of resources for CDK to build (aws-netcore-app-using-batch/code/resources)
- Python Batch processor code and Dockerfile for the AWS Batch to execute (aws-netcore-app-using-batch/code/src/BatchProcessor)
Open the file “/aws-netcore-app-using-batch/cdk/src/MyApp/SrcStack.cs”. Code below (provided a snippet from the github solution) spins up a VPC for the required CIDR and number of availability zones.
IVpc todoVpc = new Vpc(this, Constants.VPC_NAME, new VpcProps{
Cidr = Constants.CIDR_RANGE,
MaxAzs = 4
});
Similarly, developers have an option of implementing the CDK Constructs and can define & deploy an AWS Service. In the below snippet, Batch service role is created by implementing the construct and returns an IAM Role. See full source in /aws-netcore-app-using-batch/cdk/src/MyApp/Modules/BatchServiceRole.cs
public sealed class BatchServiceRole: Construct {
...
...
public BatchServiceRole(Construct scope, string id): base(scope, id){
this.scope = scope;
this.id = id;
this.Role = GetRole(
...
...
);
}
Points to consider
- The provided sample uses Microsoft .NET C# AWS CDK instead of CloudFormation. Any other supported programming language for CDK can be used. Please refer AWS CDK Documentation.
- The sample provided has Python code that is packaged for Lambda Function. This can be coded in any other available programming language.
- For the AWS Batch Processing, a Python code & Dockerfile is provided using which ImageAssets are built by the CDK. Container Image is packaged and pushed as part of CDK commands execution. This Python code is run from the ECR which eventually parses S3 file and pushes it to Dynamo. Provided sample is making using of python. This can be coded in any other available programming language which allows code to be containerized.
- As part of this walkthrough, you use the Optimal Instances for the batch. The a1.medium instance is a less expensive instance type introduced for batch operations, but you can use any AWS Batch capable instance type according to your needs.
- To handle a higher volume of CSV file contents, you can do multithreaded or multiprocessing programming to complement the AWS Batch performance scale.
- .NET Core 3.1, AWS CDK version 1.32.2, developer preview for .NET are used (available at the time of writing). Newer implementation, other functionalities may be available with upcoming newer versions. Please watch out for newer releases which may remove any obsolete implementation
When deployed, the AWS CDK creates the following infrastructure.
You can download the source from the github. Below steps will detail using the downloaded code. This has the source code for AWS CDK that spins up the infrastructure. Additionally a lambda Python code and batch python application (.py file) are provided. A sample CSV file is provided which will be processed by the batch. You can optionally use the below git command to clone the repository as below. This becomes your SOUCE_REPOSITORY
$ git clone https://github.com/aws-samples/aws-netcore-app-using-batch
$ cd aws-netcore-app-using-batch
$ dotnet build src
$ cdk bootstrap
$ cdk deploy --require-approval never
NOTE: Optionally, you can also use the “run.sh” script/bash file provided as part of the code base within “cdk” folder, that will take care of the above steps. Once the preceding CDK Deploy command is completed successfully, two CloudFormation stacks are created. Take a moment to identify the major components. The CloudFormation stack spins up the following resources, which can be viewed in the AWS Management Console. Please find sample stack resources that gets created. This can be changed with “Constants.cs” file having STACK_PREFIX variable (Ex: netcore is the default prefix in the provided sample)
- CloudFormation
- CDK Toolkit
- Stack – netcore-cdk-batch-app
- EC2 (Compute Infrastructure)
- netcore-batch-compute-environment-asg-<guid>
- S3 Bucket
- netcore-batch-processing-job-<account_number>
- ECR
- netcore-cdk-batch-app-repository
- Dynamo Table
- netcore-cdk-batch-app-table
- AWS Batch Job Queue
- netcore-batch-job-queue Job Definition – netcore-batch-job-definition
- Lambda Function
- netcore-lambda-batch-processing-function
Using CDK constructs, we have built the above infrastructure and integrated the solution with a Public Load Balancer. The output of this stack will give the API URLs for health check and API validation. As you notice by defining the solution using CDK, you were able to:
- Use object-oriented techniques to create a model of your system
- Organize your project into logical modules
- Save time with code completion in your IDE
Other major advantages using this CDK approach include, as a developer/development team we should be able, to
- Use logic (if statements, for-loops, etc) when defining your infrastructure
- Define high level abstractions, share them, and publish them to your team, company, or community
- Share and reuse your infrastructure as a library
- Test your infrastructure code using industry-standard protocols
- Use your existing code review workflow
Testing
- In AWS Console, select “CloudFormation”. Select the S3 bucket that was created as part of the stack.
- This will be something like – netcore-batch-processing-job-<account-number>
- Drop the sample CSV file provided as part of the code.
- The processed CSV file rows are stored in Dynamo table as output
Code Cleanup
AWS CDK Destroy command deletes the CloudFormation stack and all the AWS Resources that were created as part of the stack. AWS Services that provide “Retain” policy on stack deletion need to be manually deleted. This can be done either in AWS Console or using AWS CLI (commands provided) below
Using AWS Console
- Open AWS Console, select “S3”, navigate to the bucket created as part of the stack and delete the S3 bucket manually.
- Similarly within the AWS Console, look for “ECR” and “DynamoDB” table that are created as part of the stack and delete the repository and table that were created by this stack.
- In AWS Console, look for “CloudFormation”, select “CDKToolkit” stack
- Go to “Resources” tab, select the staging s3 bucket
- Select all the contents & delete the contents manually
Using AWS CLI
$ cdk destroy
# CLI Commands to delete the S3, Dynamo and ECR repository
$ aws s3 rb s3://netcore-batch-processing-job- --force
$ aws ecr delete-repository --repository-name netcore-cdk-batch-app-repository --force
$ aws dynamodb delete-table --table-name netcore-cdk-batch-app-table
Conclusion
You were able to launch an application process involving AWS Batch to integrate with various AWS services. Depending on the scalability of the application needs, AWS Batch is able to process both faster and cost efficiently. The post walked through deploying a lambda and batch processing code with infrastructure as code using Microsoft .NET AWS CDK.
I encourage you to test this example and see for yourself how this overall orchestration works with AWS Batch. Then, it is just a matter of replacing your Python (or any other programming language framework) code, packaging it as a Docker container, and letting the AWS Batch handle the process efficiently. If you decide to give it a try, have any doubt, or want to let me know what you think about the post, please leave a comment!
References
* AWS Cloud Development Kit(CDK)
* AWS CDK .NET API Reference
* Microsoft .Net Core
* Windows on AWS
* Docker Containers
If you are interested in a .NET Web API with Amazon Aurora database using AWS CDK, refer this blog.
About the author
Sivasubramanian Ramani (Siva Ramani) is a Sr Cloud Application Architect at AWS. His expertise is in application optimization, serverless solutions and using Microsoft application workloads with AWS.