Containers

Optimize your Spring Boot application for AWS Fargate

Update: Spring Boot has been updated to version 3, which also means that Amazon Corretto 17 is used as JDK for all versions.

Fast startup times are key to quickly react to disruptions and demand peaks, and they can increase the resource efficiency. With AWS Fargate, you don’t need to take care of the underlying container hosts; however, some changes are often needed to shorten the time to bootstrap your container and the application.

This post describes optimization techniques to be applied to Java applications that run on Fargate. We specifically look at Java Spring Boot applications in this post, but these optimizations can also be applied to other types of containerized Java applications.

You can find the demonstration application code on GitHub that shows you the different implementations.

Solution overview

Our example application is a simple REST-based Create Read Update Delete (CRUD) service that implements basic customer management functionalities. All data is persisted in an Amazon DynamoDB table accessed using the AWS SDK for Java V2.

The REST-functionality is located in the class CustomerController, which uses the Spring Boot RestController-annotation. This class invokes the CustomerService, which uses the Spring data repository implementation, CustomerRepository. This repository implements the functionalities to access an Amazon DynamoDB table with the AWS SDK for Java V2. All user-related information is stored in a Plain Old Java Object (POJO) called Customer.

The following architecture diagram presents an overview of the solution.

Infrastructure showing a VPC, an ALB, the ECS cluster with AWS Fargate, ECR, and Amazon DynamoDB

Figure 1. Architecture diagram of the solution

For our tests, we created seven different versions of our application:

  • Version 1, not optimized, running on x86_64
  • Version 2, not optimized, running on ARM64
  • Version 3, custom Java runtime environment (JRE) and additional optimizations running on x86_64
  • Version 4, custom JRE and additional optimizations running on ARM64
  • Version 5, Spring Native (GraalVM AoT compilation) running on X86_64 with Ubuntu 22 parent image
  • Version 6, Spring Native (GraalVM AoT compilation) running on ARM64 with Ubuntu 22 parent image
  • Version 7, Spring Native (GraalVM AoT compilation) running on X86_64 with distroless parent image

Prerequisites

You will need the following to complete the steps in this post:

Walkthrough

Multi-arch container images

Multi-arch (or multi-architecture) refers to container images for different processor architectures built from the same code. There are multiple ways to create multi-arch images. In this post, we use a QEMU-emulation to quickly create the multi-arch images. If you plan to use multi-arch images for more than just for testing purposes, then please consider to build your images using a proper CI/CD pipeline. The first step is to install the Docker Buildx-CLI plugin. This step isn’t necessary if you’ve installed Docker Desktop, which includes buildx and emulators out of the box.

export DOCKER_BUILDKIT=1
docker build --platform=local -o . https://github.com/docker/buildx.git

mkdir -p ~/.docker/cli-plugins
mv buildx ~/.docker/cli-plugins/docker-buildx
chmod a+x ~/.docker/cli-plugins/docker-buildx

We install the emulators to build and run containers for ARM64 on an Amazon EC2 or AWS Cloud9 instance:

docker run --privileged --rm tonistiigi/binfmt --install all

In the next step, we start with a new builder:

docker buildx create --name SpringBootBuild --use
docker buildx inspect --bootstrap

Now we start building a multi-arch image with the buildx-parameter. In the following command, you can see that we specify two different architectures: amd64 and arm64. A multi-arch manifest is generated and in addition pushed into an Amazon ECR registry. Note that in the measurement, each version is strictly separated and executed at the same time, so each directory is used to build and push images of the corresponding architecture.

docker buildx build --platform linux/amd64,linux/arm64 --tag <account-id>.dkr.ecr.<region>.amazonaws.com/<your-repo>:latest --push .

In the task definition for Amazon Elastic Container Service (ECS) with Fargate, we specify the parameter cpuArchitecture (valid values are X86_64 and ARM64) to run your task using the desired CPU architecture. We will go into this in more detail in a later section.

Setting up the infrastructure

In the previous steps, we compiled the application to a native image and built a container image that has been stored in an Amazon ECR repository. Now, we set up the basic infrastructure consisting of an Amazon Virtual Private Cloud (Amazon VPC), an Amazon ECS cluster with Fargate launch type, a DynamoDB table, and an Application Load Balancer (ALB).

Codifying your infrastructure allows you to treat your infrastructure just as code. In this post, we use the AWS CDK, an open-source software development framework, to model and provision cloud application resources using familiar programming languages. The code for the AWS CDK application can be found in the demonstration application’s code repository under cdkapp/lib/cdkapp-stack.ts.

The following sections set up the infrastructure in the AWS Region eu-west-1 for the first version of our application:

$ npm install -g aws-cdk # Install the CDK if this hasn’t been installed already
$ cd cdkapp
$ npm install # retrieves dependencies for the CDK stack
$ npm run build # compiles the TypeScript files to JavaScript
$ cdk bootstrap
$ cdk deploy CdkappStack --parameters containerImage=<your_repo/you_image:tag> --context cpuType=X86_64

As shown in the last AWS CDK command, it is possible to define the CPU architecture for the Amazon ECS task definition, with the possible values of X86_64 and ARM64.

The output of the AWS CloudFormation stack is the ALB’s Domain Name System (DNS) record. The heart of our infrastructure is an Amazon ECS cluster with AWS Fargate launch type. With this AWS CDK application, we set up an Amazon ECS cluster with Fargate as the launch type. Depending on the context (x86_64 or ARM64), a task definition for the Amazon ECS task is created with the right CPU architecture, 1 vCPU, and 2GB RAM. In addition, we create an AWS Fargate service, which is exposed with an ALB. This service also offers a health check that is implemented using Spring Boot Actuator.

Performance considerations

Let’s investigate the impact of using the different optimizations in comparison to the regular build of our sample Java application.

Our references for the performance measurement are the first and second version of the application. Both applications have the same logic implemented with the same dependencies, but only the CPU architecture is different between these applications. The dependencies include the full AWS SDK for Java, the DynamoDB enhanced client, and Lombok. Project Lombok is a code generation library tool for Java to minimize boilerplate code. The DynamoDB enhanced client is a high-level library that is part of the AWS SDK for Java version 2 and offers a straightforward way to map client-side classes to DynamoDB tables. This solution allows you to define the intuitively performed various create, read, update, or delete (CRUD) operations on tables or items in DynamoDB. More information about the Amazon DynamoDB enhanced client and examples can be found here.

In addition, we use Tomcat as the web container and Java 11. In our Dockerfile, we use Ubuntu 22.04 as the parent image and install a full Amazon Corretto 11 Java Development Kit (JDK). The above conditions result in a container image of considerable size (in our case this would be 900 MB), which has a negative effect on the pull time of the image from the registry as well as on the startup time of the application.

In the second iteration of the application (version three and four), we apply several optimizations to the application. We reduce the number of dependencies by just using the required AWS SDK dependencies. In addition, we replaced Tomcat with Undertow, which is a more lightweight and performant web container. For access to Amazon DynamoDB, we remove the DynamoDB enhanced client and just use the standard client.

For this version, we use Amazon Corretto 17 and build our own runtime using jdeps and jlink as part of our multi-stage build process of the container image:

RUN jdeps --ignore-missing-deps \
--multi-release 17 --print-module-deps \
--class-path target/BOOT-INF/lib/* \
target/CustomerService-0.0.1.jar > jre-deps.info

RUN export JAVA_TOOL_OPTIONS="-Djdk.lang.Process.launchMechanism=vfork" && \
jlink --verbose --compress 2 --strip-java-debug-attributes \
--no-header-files --no-man-pages --output custom-jre \
--add-modules $(cat jre-deps.info)

With jdeps we generate a list of necessary JDK modules to run the application and write this list to jre-deps.info. This file can be used as input for jlink, which is a tool to create a custom JRE based on a list of modules. In our Dockerfile, we use Ubuntu 22.04 as the parent image and copy our custom JRE to the target container image. By limiting the number of dependencies and building a custom JRE, we reduce the size of our target image significantly (about 200 MB). We start our application with the parameters -XX:TieredStopAtLevel=1 and -noverify. Tiered compilation stopping at level 1 reduces the time that the JVM spends optimizing and profiling your code, which improves the startup time. However, it has a negative impact if the application is called many times because the code isn’t optimized. The noverify-flag disables bytecode verification that has a security implication: the classloader won’t check the behavior of the bytecode.

In the third iteration (150 MB–200 MB) of our application (versions five, six, and seven), we introduced GraalVM with Spring Native . With this change, you can compile Spring applications to native executables using GraalVM native image compiler. GraalVM is a high-performance distribution of the JDK and transforms bytecode into machine code. This is done using static analysis of the code, which means that all the information is available during compile time. Of course, this implies that you can’t generate code at runtime. For x86 and ARM we chose Ubuntu 22.04 as parent image, because we want comparable results. To minimize the resulting container image, we create one additional configuration with quay.io/quarkus/quarkus-distroless-image as the parent image for x86 (quarkus-distroless-image isn’t available for ARM64 at the moment).

Measurement and results

We want to get the AWS services and architecture optimized, so we measure the task readiness duration shown below for the AWS Fargate task. This can be calculated using the timestamp of the runTask-API call in AWS CloudTrail and the timestamp of the ApplicationReadyEvent in our Spring Boot-application.

To measure the startup time, we use a combination of data from the task metadata endpoint and API calls to the control plane of Amazon ECS. Among other things, this endpoint returns the task ARN and the cluster name.

We need this data to send describeTasks-calls to the Amazon ECS control plane in order to receive the following metrics:

  • PullStartedAt: The timestamp for when the first container image pull began.
  • PullStoppedAt: The timestamp for when the last container image pull finished.
  • CreatedAt: The timestamp for when the container was created. This parameter is omitted if the container has not been created yet.
  • StartedAt: The timestamp for when the container started. This parameter is omitted if the container has not started yet.

The logic for pulling the necessary metrics is implemented in the EcsMetaDataService class.

The different states of our Fargate tasks are shown in the following diagram.

Different states of the Fargate tasks as a diagram

Figure 2: Different states of our Fargate tasks

And what effect did the change have on our application? The following list is ordered by effectiveness and easiness:

  • Reduce the image size: The container image size has the biggest impact for the task readiness time. The smaller the image is, the faster it gets pulled from the Amazon ECR repository and the faster the application starts. Our image for the unoptimized version of our Spring Boot application is over 900 MB large (iteration 1), the optimized version with a custom JRE and minimized dependencies has 200 MB (iteration 2), the third iteration with Spring Native is 200 MB with Ubuntu and 150 MB with the distroless image (iterations 3). The effect on the pull time is surprisingly high: from iteration 1 to iteration 2 about 75 percent less time was used, from iteration 2 to iteration 3 the impact is smaller with 38 percent (distroless-based image), respectively 12 percent (Ubuntu-based image), and from iteration 1 and iteration 3 is 85 percent (distroless-based image), respectively 80 percent (Ubuntu-based image).
  • Use a custom JRE: For the raw start time of the Java application (ApplicationReadyEvent and JVM startup time), we see a huge impact on performance for the different versions: from iteration 1 to iteration 2, with about 78 percent less time was used.
  • Use Spring Native: Using GraalVM and native image has a tremendous impact on startup time. From iteration 2 to iteration 3, the startup time improved by 96 percent, which means we achieve a 99 percent improvement from iteration 1 to iteration 3.

When we take a closer look at the complete starting time, beginning with the runTask-API call and ending with the ApplicationReadyEvent, we can see a performance gain from iteration 1 to iteration 2 of 58 percent, and from iteration 2 to iteration 3, the impact is 28 percent. We observed an overall improvement of almost 70 percent from iteration 1 to iteration 3.

The startup duration results of our Spring Boot application are shown in the following diagram.

Performance results of the Spring Boot application with different configurations as box-chart

Figure 3. Startup duration results of our Spring Boot application

Tradeoffs

Some legacy libraries and dependencies don’t support the Java 9-module system, which means it isn’t possible to build a custom JRE with jdeps and jlink. In such situations, it’s necessary to migrate to a library that supports the module system, which requires additional development effort.

GraalVM assumes that all code is known at the build time of the image, which means that no new code will be loaded at the runtime. Consequently, not all applications can be optimized using GraalVM. For more information, read about the limitations in the GraalVM documentation. If the native image build for your application fails, then a fallback image is created that requires a full JVM to run. In addition, native-image compilation with GraalVM requires more time and impacts developer productivity.

Cleaning up

After you are finished, you can easily destroy these resources with a single command to save costs.

$ cdk destroy

Conclusion

In this post, we demonstrated the impact of different optimization steps on the startup time of a Spring Boot application running on Amazon ECS with Fargate. We started with a typical implementation with several unnecessary dependencies, a full JDK, and a huge parent image. We reduced dependencies and built our own JRE using jdeps and jlink. We adopted Spring Native and GraalVM to reduce the startup time and switched to a distroless parent image. For many developers, the variant with the custom JRE and the minimized dependencies is the best solution in terms of complexity of the changes and performance gains. This is especially true when ARM instruction set with AWS Graviton2 are used. Graviton2 processors are custom built by AWS using 64-bit Arm Neoverse cores and they are powered by Fargate. Fargate, powered by Graviton2 processors, delivers up to 40 percent better price performance at 20 percent lower cost over comparable Intel x86-based Fargate for containerized applications.

We hope we’ve given you some ideas on how you can optimize your existing Java application to reduce startup time and memory consumption. Feel free to submit enhancements to the sample application in the source repository.