Orchestrating an application process with AWS Batch using AWS CDK

In many real work applications, you can use custom Docker images with AWS Batch and AWS Cloud Development Kit(CDK) to execute complex jobs efficiently. AWS CDK is an open source software development framework to model and provision your cloud application resources using familiar programming languages, including TypeScript, JavaScript, Python, C# and Java. For the solution in this blog, we will use C# for the infrastructure code. This is a sequel to an earlier published blog with same infrastructure and concept. In this blog, we will leverage the capabilities and features of AWS CDK (with Microsoft .NET using C#) instead of CloudFormation. Let’s get started!

Overview

This post provides a file processing implementation using Docker images and Amazon S3, AWS Lambda, Amazon DynamoDB, and AWS Batch. In this scenario, the user uploads a CSV file to into an Amazon S3 bucket, which is processed by AWS Batch as a job. These jobs can be packaged as Docker containers and are executed using Amazon EC2 and Amazon ECS. The following steps provide an overview of this implementation:

AWS CDK bootstraps and deploys a CloudFormation template.
The infrastructure launches the S3 bucket that stores the CSV files. Other AWS services required for this application orchestration are also spun up.
The Amazon S3 file event notification executes an AWS Lambda function that starts an AWS Batch job.
AWS Batch executes the job as a Docker container.
A Python-based program reads the contents of the S3 bucket, parses each row, and updates an Amazon DynamoDB table.
Amazon DynamoDB stores each processed row from the CSV.

Prerequisites

Make sure to have Docker installed and running on your machine. For instructions, see Docker Desktop and Desktop Enterprise
Set up your AWS CLI. For steps, see Getting Started (AWS CLI)

Solution Steps

The following steps outline this walk-through. Detailed steps are given throughout the rest of this post.

Download the code and run the Microsoft Dotnet build and AWS CDK deployment commands (commands provided below) to create the necessary infrastructure.
Drop the CSV into the S3 bucket (provided a sample Sample.csv that will be parsed by the Batch Operation).
Confirm that the job runs and performs the operation based on the pushed container image. The job parses the CSV file and adds each row into DynamoDB.

Once the code is downloaded, please take a moment to see how CDK provides a simpler implementation for spinning up an infrastructure using C# code. You may use Visual Studio Code or your favorite choice of IDE to open the folder (aws-netcore-app-using-batch). The provided source code consists of the following major components. Note the corresponding downloaded path for these components provided below.

Microsoft .NET AWS CDK to build the infrastructure (aws-netcore-app-using-batch/code/src/MyApp)
Python Lambda code as part of resources for CDK to build (aws-netcore-app-using-batch/code/resources)
Python Batch processor code and Dockerfile for the AWS Batch to execute (aws-netcore-app-using-batch/code/src/BatchProcessor)

Open the file “/aws-netcore-app-using-batch/cdk/src/MyApp/SrcStack.cs”. Code below (provided a snippet from the github solution) spins up a VPC for the required CIDR and number of availability zones.

IVpc todoVpc = new Vpc(this, Constants.VPC_NAME, new VpcProps{
    Cidr = Constants.CIDR_RANGE,
    MaxAzs = 4
});

Similarly, developers have an option of implementing the CDK Constructs and can define & deploy an AWS Service. In the below snippet, Batch service role is created by implementing the construct and returns an IAM Role. See full source in /aws-netcore-app-using-batch/cdk/src/MyApp/Modules/BatchServiceRole.cs

public sealed class BatchServiceRole: Construct {
...
...
    public BatchServiceRole(Construct scope, string id): base(scope, id){
      this.scope = scope;
      this.id = id;
      this.Role = GetRole(
      ...
      ...
    );
}

Points to consider

The provided sample uses Microsoft .NET C# AWS CDK instead of CloudFormation. Any other supported programming language for CDK can be used. Please refer AWS CDK Documentation.
The sample provided has Python code that is packaged for Lambda Function. This can be coded in any other available programming language.
For the AWS Batch Processing, a Python code & Dockerfile is provided using which ImageAssets are built by the CDK. Container Image is packaged and pushed as part of CDK commands execution. This Python code is run from the ECR which eventually parses S3 file and pushes it to Dynamo. Provided sample is making using of python. This can be coded in any other available programming language which allows code to be containerized.
As part of this walkthrough, you use the Optimal Instances for the batch. The a1.medium instance is a less expensive instance type introduced for batch operations, but you can use any AWS Batch capable instance type according to your needs.
To handle a higher volume of CSV file contents, you can do multithreaded or multiprocessing programming to complement the AWS Batch performance scale.
.NET Core 3.1, AWS CDK version 1.32.2, developer preview for .NET are used (available at the time of writing). Newer implementation, other functionalities may be available with upcoming newer versions. Please watch out for newer releases which may remove any obsolete implementation

When deployed, the AWS CDK creates the following infrastructure.

You can download the source from the github. Below steps will detail using the downloaded code. This has the source code for AWS CDK that spins up the infrastructure. Additionally a lambda Python code and batch python application (.py file) are provided. A sample CSV file is provided which will be processed by the batch. You can optionally use the below git command to clone the repository as below. This becomes your SOUCE_REPOSITORY

$ git clone https://github.com/aws-samples/aws-netcore-app-using-batch$ cd aws-netcore-app-using-batch
$ dotnet build src
$ cdk bootstrap
$ cdk deploy --require-approval never

NOTE: Optionally, you can also use the “run.sh” script/bash file provided as part of the code base within “cdk” folder, that will take care of the above steps. Once the preceding CDK Deploy command is completed successfully, two CloudFormation stacks are created. Take a moment to identify the major components. The CloudFormation stack spins up the following resources, which can be viewed in the AWS Management Console. Please find sample stack resources that gets created. This can be changed with “Constants.cs” file having STACK_PREFIX variable (Ex: netcore is the default prefix in the provided sample)

CloudFormation
- CDK Toolkit
- Stack – netcore-cdk-batch-app
EC2 (Compute Infrastructure)
- netcore-batch-compute-environment-asg-<guid>
S3 Bucket
- netcore-batch-processing-job-<account_number>
ECR
- netcore-cdk-batch-app-repository
Dynamo Table
- netcore-cdk-batch-app-table
AWS Batch Job Queue
- netcore-batch-job-queue Job Definition – netcore-batch-job-definition
Lambda Function
- netcore-lambda-batch-processing-function

Using CDK constructs, we have built the above infrastructure and integrated the solution with a Public Load Balancer. The output of this stack will give the API URLs for health check and API validation. As you notice by defining the solution using CDK, you were able to:

Use object-oriented techniques to create a model of your system
Organize your project into logical modules
Save time with code completion in your IDE

Other major advantages using this CDK approach include, as a developer/development team we should be able, to

Use logic (if statements, for-loops, etc) when defining your infrastructure
Define high level abstractions, share them, and publish them to your team, company, or community
Share and reuse your infrastructure as a library
Test your infrastructure code using industry-standard protocols
Use your existing code review workflow

Testing

In AWS Console, select “CloudFormation”. Select the S3 bucket that was created as part of the stack.
This will be something like – netcore-batch-processing-job-<account-number>
Drop the sample CSV file provided as part of the code.
The processed CSV file rows are stored in Dynamo table as output

Code Cleanup

AWS CDK Destroy command deletes the CloudFormation stack and all the AWS Resources that were created as part of the stack. AWS Services that provide “Retain” policy on stack deletion need to be manually deleted. This can be done either in AWS Console or using AWS CLI (commands provided) below

Using AWS Console

Open AWS Console, select “S3”, navigate to the bucket created as part of the stack and delete the S3 bucket manually.
Similarly within the AWS Console, look for “ECR” and “DynamoDB” table that are created as part of the stack and delete the repository and table that were created by this stack.
In AWS Console, look for “CloudFormation”, select “CDKToolkit” stack
Go to “Resources” tab, select the staging s3 bucket
Select all the contents & delete the contents manually

Using AWS CLI

$ cdk destroy

# CLI Commands to delete the S3, Dynamo and ECR repository

$ aws s3 rb s3://netcore-batch-processing-job- --force

$ aws ecr delete-repository --repository-name netcore-cdk-batch-app-repository --force

$ aws dynamodb delete-table --table-name netcore-cdk-batch-app-table

Conclusion

You were able to launch an application process involving AWS Batch to integrate with various AWS services. Depending on the scalability of the application needs, AWS Batch is able to process both faster and cost efficiently. The post walked through deploying a lambda and batch processing code with infrastructure as code using Microsoft .NET AWS CDK.

I encourage you to test this example and see for yourself how this overall orchestration works with AWS Batch. Then, it is just a matter of replacing your Python (or any other programming language framework) code, packaging it as a Docker container, and letting the AWS Batch handle the process efficiently. If you decide to give it a try, have any doubt, or want to let me know what you think about the post, please leave a comment!

References

* AWS Cloud Development Kit(CDK)
* AWS CDK .NET API Reference
* Microsoft .Net Core
* Windows on AWS
* Docker Containers

If you are interested in a .NET Web API with Amazon Aurora database using AWS CDK, refer this blog.

About the author

Siva Ramani (Sr Cloud App Arch) Sivasubramanian Ramani (Siva Ramani) is a Sr Cloud Application Architect at AWS. His expertise is in application optimization, serverless solutions and using Microsoft application workloads with AWS.

AWS Developer Tools Blog