Containers

Automatically enable group metrics collection for Amazon EKS managed node groups

Introduction

Amazon Elastic Kubernetes Service (Amazon EKS) managed node groups automate the provisioning and lifecycle management of Kubernetes nodes (Amazon Elastic Compute Cloud (Amazon EC2) instances) for Amazon EKS Kubernetes clusters.

Managed nodes are provisioned as part of an Amazon EC2 Auto Scaling group that’s managed for you by Amazon EKS. Amazon EKS doesn’t enable group metrics collection for Auto Scaling groups created for managed nodes.

In issue 762 on AWS Container Services roadmap, customers requested us to enable group metrics collection by default. This post provides a solution for enabling Auto Scaling group metrics collection using AWS Lambda and AWS CloudTrail.

Auto Scaling group metrics

Customers use Auto Scaling group metrics to track changes in an Auto Scaling group and to set alarms on threshold values. Auto Scaling group metrics are available in the Auto Scaling console or the Amazon CloudWatch console. Once enabled, the Auto Scaling group sends sampled data to Amazon CloudWatch every minute. There is no charge for enabling these metrics.

By enabling Auto Scaling group metrics collection you’ll be able to monitor the scaling of managed node groups. Auto Scaling group metrics report the minimum, maximum, and desired size of an Auto Scaling group. You can create an alarm if the number of nodes in a node group falls below the minimum size, which would indicate an unhealthy node group. Tracking node group size is also useful in adjusting the max count so your data plane doesn’t run out of capacity.

Solution overview

When you create a managed node group, AWS CloudTrail sends a CreateNodegroup to Amazon EventBridge. By creating an Amazon EventBridge rule that matches the CreateNodegroup event, you trigger an AWS Lambda function to enable group metrics collection for the Auto Scaling group associated with the managed node group.

Diagram showing the managed node group, ClouTrail and Eventbridge components

The Amazon Cloud Development Kit (AWS CDK) code provided in this post creates an Amazon EventBridge rule that forwards the CreateNodegroup event to a AWS Lambda function. The function extracts the cluster name and managed node group name from the event to determine the associated Auto Scaling group using the Amazon EKS describe-nodegroup application programming interface (API). The function then enables group metrics collection on the Auto Scaling group.

The function looks for specific tags on the managed node group. By default, the function enables Auto Scaling group metrics collection when you create a new managed node group with the ASG_METRICS_COLLLECTION_ENABLED tag set to TRUE. You can customize the tag in the AWS Lambda function code.

Prerequisites

You need the following to complete the steps in this post:

  • AWS CDK 2.19 or later
  • Git
  • AWS Command Line Interface (AWS CLI) version 2 (for testing only)
  • Python 3.7 or later
  • An EKS cluster

Clone the code repository:

git clone https://github.com/aws-samples/containers-blog-maelstrom.git
cd containers-blog-maelstrom/eks-enable-asg-metrics

Bootstrap AWS CDK if this is your first time using it:

AWS_ACCOUNT_ID=[YOUR AWS ACCOUNT NUMBER]
AWS_REGION=us-east-2

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

cdk bootstrap aws://${AWS_ACCOUNT_ID}/${AWS_REGION}

Deploy the stack to create the following resources

  • An AWS Lambda function and a Amazon CloudWatch log group.
  • An Amazon EventBridge rule that matches CreateNodegroup and sends the event to the AWS Lambda function.
  • An AWS Identity and Access Management (AWS IAM) role that allows the AWS Lambda function to write to Amazon CloudWatch, describes Amazon EKS nodegroups, and enables Auto Scaling group metrics collection.
cdk deploy

Validation

Create an Amazon EKS managed node group and enable Auto Scaling group metrics collection using tags. You can use the reference AWS CLI following command to create a managed node group.

 aws eks create-nodegroup \
    --cluster-name [CLUSTER NAME] \
    --subnets [subnet ID(s)] \
    --node-role [IAM Role for the node]\
    --tags ASG_METRICS_COLLLECTION_ENABLED=TRUE \
    --nodegroup-name ASG-TEST

Wait about five minutes before verifying that the Auto Scaling group metrics are enabled. The Amazon EKS console shows Auto Scaling groups for managed node groups.

Screenshot of the test-asg-2 node group

Navigate to the associated Auto Scaling group in the Amazon EC2 console and switch to the Monitoring tab. The option for Auto Scaling group metrics collection should now be enabled.

Screenshot of CloudWatch monitoring details with red box around Auto Sclaing group metrics collection enabled

Enable Auto Scaling group metrics collection for existing managed node groups

The AWS Lambda function provided created by the CDK code enables Auto Scaling group metrics collection for Amazon EKS managed node groups created after deploying the stack. You can use the following Python script to enable Auto Scaling group metrics collection for existing managed node groups for all clusters in a Region.

import boto3

eks = boto3.client('eks')
autoscaling = boto3.client('autoscaling')


def enable_autoscaling_group_metrics_collection():
    clusters = eks.list_clusters()['clusters']
    print("Found", len(clusters), "EKS clusters.")
    for cluster in clusters:
        nodegroups=eks.list_nodegroups(clusterName=cluster)["nodegroups"]
        print("Cluster", cluster, "has", len(nodegroups), "node groups.")
        for nodegroup in nodegroups:
            try:
                autoScalingGroups = eks.describe_nodegroup(clusterName=cluster,nodegroupName=nodegroup)["nodegroup"]["resources"]["autoScalingGroups"]
            except:
                print("Failed to obtain auto scaling group for nodegroup", nodegroup)
                break

            for autoScalingGroup in autoScalingGroups:
                try: 
                    metricsResult = autoscaling.enable_metrics_collection(AutoScalingGroupName=autoScalingGroup["name"],Granularity="1Minute")
                except:
                    print("Failed to enable group metrics collection for", autoScalingGroup)
                else:
                    print("Enabled group metrics collection for", nodegroup,".")
                    break

enable_autoscaling_group_metrics_collection()

Cleanup

Remove resources created in this post by running the following command:

 cdk destroy

Delete the node group created for testing:

aws eks delete-nodegroup \
 --cluster-name [CLUSTER NAME] \
 --nodegroup-name ASG-TEST

Conclusion

In this post, I showed you how to enable Auto Scaling group metrics collection for Amazon EKS managed node groups. You can control group metrics collection for your managed node groups by adding a tag (ASG_METRICS_COLLLECTION_ENABLED=TRUE) to your node groups.

You can track the development of this roadmap on AWS Container Services roadmap on Github.

This post includes contributions from Maksim Poletaev, Sr Solutions Architect, AWS.