Containers

A deep dive into Bottlerocket ECS Updater

Last month, we announced the general availability of the Amazon Elastic Container Service (Amazon ECS) optimized Bottlerocket AMI. Today, I would like to focus on the Bottlerocket ECS Updater. The ECS Updater is a service you can install into your ECS cluster that helps you keep your Bottlerocket container instances up to date.

Before I get into the details of how the service works, we need to cover how updates are applied in Bottlerocket (documented extensively in the Bottlerocket GitHub repo). It is quite different from what you might be used to in most operating systems. Most operating systems allow you to update individual software components one at a time. Bottlerocket is different and updates the entire operating system all at once.

Maintaining consistency across all the instances and their operating systems is a challenge. When you update an operating system with a package (for example by running yum install -y <package_name> in Amazon Linux 2), the version of the operating system drifts from the origin, and there is no real baseline operating system version that you can track. Bottlerocket solves this problem by using an image-based method to replace the whole operating system with a specific version number on each update.

When Bottlerocket downloads an update from The Update Framework repository (TUF), and it is ready to install, the update is written to a secondary partition. Bottlerocket marks the newly updated partition as the primary partition. It then reboots into the new primary partition with the new version of Bottlerocket, and leaves the old version of the image available as a secondary-inactive partition. Bottlerocket is also equipped with a separate, writable portion of the filesystem that is designed for persistent user data, like container images and volumes. This makes updates more predictable and a standard mechanism that can be used for quickly rolling back, if you experience a problem.

TUF Repository diagram

Bottlerocket utilizes waves to stagger the deployments across your fleet of instances. Waves are the mechanism Bottlerocket uses to reduce the potential impact of bugs in an update. By default, Bottlerocket updates become visible to a progressively larger subset of hosts over the course of time to reduce the risk that a bug affects your whole cluster at once.

These actions are executed as part of the automated process (which we will go into shortly) by running the following apiclient commands:

  • apiclient update check
  • apiclient update apply
  • apiclient reboot

The following diagram describes the process.

Bottlerocket updates process using apiclient commands

Because updates require a reboot and can be disruptive to running applications, our customers have asked for tighter integration between Bottlerocket and Amazon ECS, specifically for the option to make Amazon ECS more “Bottlerocket-aware,” and make use of container instance draining. This allows you to maintain maximum availability of your application without incurring downtime during operating system updates. It also allows you to perform regular and scheduled maintenance on the underlying operating system when needed.

The ECS Updater is an application that can be deployed as an AWS CloudFormation template. The template creates the following resources in your account.

  • An ECS Fargate task definition for the Bottlerocket ECS Updater
  • A CloudWatch Events scheduled rule to execute the Bottlerocket ECS Updater
  • An IAM role for the Bottlerocket ECS Updater task itself, as well as roles for Fargate and CloudWatch Events
  • SSM documents to query and execute updates on Bottlerocket instances

The updater queries the ECS API to discover all the container instances in your cluster with a filter for Bottlerocket instances by reading the bottlerocket.variant attribute. For each Bottlerocket instance found, the updater executes an SSM document on the instance to check for available updates using the apiclient update check command. When an update is available, the updater checks to see whether the tasks currently running on the container instance are part of a service and eligible for replacement. If all the tasks are part of a service, the updater marks the container instance for draining and waits for the tasks to be successfully evacuated. The container instance as active and move on to the next one in the cluster.

ECS Updater query process diagram

  1. The AWS Fargate task is executed on a regular schedule.
  2. Tasks are executed as an SSM document on the instances.
  3. Updater queries the ECS API for all bottlerocket.variant instances.
  4. Updater runs SSM on each instance using apiclient to check for updates.
  5. Updater checks if running containers are part of an Amazon ECS service then marks instance as DRAINING.
  6. Updater runs an SSM document on each instance to download the update, apply it and reboot the instance. Then, it marks the node as active and moves onto the next instance in the cluster.
  7. All operations are logged to Amazon CloudWatch.

Let’s have a look how this is done in practice.

Prerequisites

  • The AWS CLI with appropriate credentials
  • An ECS cluster
  • EC2 instances running Bottlerocket as part of the ECS cluster (with an outdated Bottlerocket version – for example 1.1.1)
  • A CloudWatch LogGroup for ECS Updater logs.
  • A subnet with internet access for ECS Updater task to run

I have an ECS cluster deployed with EC2 instances running Bottlerocket v1.1.1. If you need guidance on how to get started with Bottlerocket, you can find it here.

Download the bottlerocket-ecs-updater.yaml file to your local machine.

The required parameters needed for deploying ECS Updater stack.

  • The name of the ECS cluster where you are running Bottlerocket container instances
  • The name of the CloudWatch Logs log group where the Bottlerocket ECS Updater will send its logs
  • At least one subnet ID that has internet access (which does not need to be shared with the rest of your cluster)

In my case, my cluster is named maishsk-bottlerocket, my CloudWatch Logs log group is bottlerocket-test, and the subnet that has internet access for the Fargate task is subnet-bc8993e6.

aws cloudformation deploy \
--stack-name "bottlerocket-ecs-updater" \
--template-file "./bottlerocket-ecs-updater.yaml" \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides \
ClusterName="maishsk-bottlerocket" \
Subnets="subnet-bc8993e6" \
LogGroupName="bottlerocket-test"

Here is what it looks like after the stack has deployed in Amazon CloudFormation

Console view after CloudFormation deployment screenshot

Here you can see that my ECS cluster has three EC2 instances and each of them are running two ECS tasks.

Console view of ECS cluster, three EC2 instances and each of them are running two ECS tasks.

To see the version of Bottlerocket that is on the instances, we need to go into the instance itself. Go to EC2 in the AWS Management Console and select one of your instances → Actions → Connect

Console view of Bottlerocket instances

Run the following command apiclient -u /os | jq to get pretty output of the OS version

command line view of apiclient

You can also execute an SSM document that runs apiclient -u /os against multiple hosts at the same time rather than one by one, if you would like.

By default, the ECS Updater will run once every 12 hours.

Console view of Rules summary

Let’s go through the logs to see what happens when the task is triggered. The logs will be in the Amazon CloudWatch Logs log group that you provided when you created the CloudFormation stack.

CloudWatch logs

The updater lists the instances in the cluster and then filters these instances for only those running Bottlerocket, and in my case, it found three.

updater cluster instances list

It runs the SSM document on each of the instances to check for an update and then instructs the ECS API to drain the containers from the instance.

Updater checking updates list

Console view of DRAINING status

Once the tasks have been drained for the instance, the update begins.

Update status and logs

An SSM document is executed on the instance to start the upgrade process, apply the update and then the instance reboots.

Updates status and logs

When the instance comes back, it is then added back into the cluster, ready to accept tasks. Once the instance comes back and reports healthy, the updater checks to see if the version that is now on the instance is in fact the correct new version. It then marks the update as complete.

Completed updates log view

If you would like, you can verify this by connecting again to the instance through EC2.

command line screenshot

The process continues until all the nodes in the cluster are updated.

To recap what we learned in this post:

  • We looked how operating system updates work in Bottlerocket
  • We went into the details of how the ECS Updater works
  • We deployed the new Bottlerocket ECS Updater CloudFormation stack on an existing ECS cluster
  • We went through the events in CloudWatch Logs to see how the actions are logged during the process

Bottlerocket’s components are open-source as is its roadmap. Our intent is for Bottlerocket to be a collaborative community project, and we invite you to get involved, provide your feedback and contribute directly. Check out our GitHub repository for discussion via issues and submit your contributions via a pull request.

Please join the #bottlerocket channel for informal interaction in the AWS Developer Slack. You can sign up to this channel here.