Containers

Ensuring fair bandwidth allocation for Amazon EKS Workloads

Independent Service Vendor (ISV) users often offer their end-user solutions hosted on a multi-tenant architecture to reduce cost and operational management. However, this approach can lead Kubernetes clusters to resource exhaustion or network starvation issues that impact neighboring workloads. By default, Kubernetes provides capabilities to enforce resource availability such as CPU and memory to prevent compute starvation. However, workloads are rapidly evolving to use other resources such as network bandwidth to improve performance. For example, a pod can choose to download terabytes of traffic at huge rate to improve response time which leads to bandwidth exhaustion and effects neighboring pods.

In this post, we look into how to solve this Kubernetes challenge with the Amazon Virtual Private Cloud (Amazon VPC) CNI plugin. We demonstrate how the Amazon VPC CNI plugin can be used to restrict pods usage on network for the ingress and egress bandwidth, preventing network starvation and making sure of network stability and QoS.

What is Amazon VPC CNI?

Although there are multiple CNI plugins available, Amazon VPC CNI plugin, developed by AWS for Amazon Elastic Kubernetes Service (EKS), allows container networking to natively use Amazon VPC networking and security features. The CNI plugin manages Elastic Network Interfaces (ENI) on a node, both Amazon Elastic Compute Cloud (Amazon EC2) and AWS Fargate.

When you provision a node, the plugin automatically allocates a pool of slots (IPs or Prefixes) from the node’s subnet of the primary ENI. It enables Kubernetes pod networking and connectivity for applications deployed on Amazon EKS along with integrating Amazon VPC networking functionality directly into Kubernetes pods. For example, pods are assigned with their own private IP addresses from the VPC and security groups can be applied to pods directly.

When enabling bandwidth limit capability, the Amazon VPC CNI plugin relies on bandwidth plugin to control ingress and egress bandwidth limits for individual containers or pods using Linux traffic control utilities such as `tc` (Traffic Control).

Walkthrough

Let’s look at how you can use the CNI plugin to enable ingress and egress traffic shaping.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • Amazon EKS cluster v1.24 and above
  • Amazon VPC CNI v1.15.0 and above
  • kubectl24 and above
  • eksctl175.0 and above

Step 0: (Optional) Create an EKS cluster using eksctl

This configuration is used with `eksctl` to provision an EKS cluster v1.28. You can skip this step if you have your own EKS cluster already.

Note that if you provision this EKS cluster v1.28, then you should make sure that your `kubectl` is also v1.28 or within one minor version difference of your cluster. For example, a v1.28 client can communicate with v1.27, v1.28, and v1.29 control planes.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: dev-cluster
  region: ap-southeast-1
  version: 1.28

managedNodeGroups:
  - name: ng-1-workers
    labels: { role: workers }
    instanceType: m6a.large
    desiredCapacity: 2
    volumeSize: 30
    privateNetworking: true

Step 1: Enable CNI bandwidth plugin of an EC2 instance

Before setting bandwidth limits of pods, you need to enable the bandwidth capability for the CNI plugin by accessing Amazon EC2. We recommend using AWS System Manager Session Manager to connect to Amazon EC2. Session Manager provides secure and auditable node management without the need to open inbound ports, maintain bastion hosts, or manage Secure Shell (SSH) keys. Therefore, you can tighten security and reduce attack surface.

When you provision an EKS cluster with the above `eksctl`, it uses Amazon EKS optimized Amazon Linux Amazon Machine Images (AMIs) by default. With this AMI, an EC2 instance is configured with the necessary requirements so you can connect to the instance using Session Manager without further setup.

To connect to a Linux instance using Session Manager with the Amazon EC2 console

  1. Open the Amazon EC2 console.
  2. In the navigation pane, choose Instances.
  3. Select the instance and choose Connect.
  4. Choose Session Manager.
  5. Choose Connect.

For more information and instructions on how to set up Session Manager, see Setting up Session Manager. After you’re connected to the instance, run the following commands:

sudo su
cd /etc/cni/net.d

echo "$(cat 10-aws.conflist | jq '.plugins += [{"type": "bandwidth", "capabilities": {"bandwidth": true}}]')" > 10-aws.conflist

The following JSON object is added under the `plugins` key.

{
  ...
  "plugins": [
    ...
    {
      "type": "bandwidth",
      "capabilities": {"bandwidth": true}
    }
  ]
}

Step 2: Install iperf and tc CLI in your EC2 instances

In this step, we install the necessary CLI tools that are used to check and test the bandwidth limitation, namely `iperf` and `tc`.

  • `iperf` is a widely used command-line tool for measuring network performance. It can be used to measure the bandwidth between two endpoints, such as between a client and a server or between two servers.
  • `tc` (Traffic control) is a user-space utility command in Linux that allows you to configure and manage the traffic control settings of network interfaces. It provides a powerful set of tools for shaping, scheduling, policing, and prioritizing network traffic.
# If you are using Amazon Linux 2, need to enable epel before getting iperf package
# https://repost.aws/knowledge-center/ec2-enable-epel
sudo amazon-linux-extras install epel -y

sudo yum install iperf -y
sudo yum install iproute-tc -y

Amazon Linux 2023 does not support Extra Packages for Enterprise Linux (EPEL), and `iperf` is not available to install through `yum`. However, you can download and install iperf for AL2023 manually by following the instruction in this AWS Knowledge Center post.

 Step 3: Deploy pods without bandwidth restriction

We will first deploy a standard application, namely `nginx`, on our EKS cluster. Bandwidth restrictions are not configured for now to facilitate later comparison between the two setups.

Create a new file called `nginx-deployment.yaml with the following definition:

cat << EOF > nginx-deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      securityContext:
        runAsNonRoot: true
      containers:
        - name: nginx
          image: nginxinc/nginx-unprivileged
          ports:
            - containerPort: 8080
          securityContext:
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
          volumeMounts:
            - mountPath: /tmp
              name: tmp
      volumes:
        - emptyDir: {}
          name: tmp
EOF

Run this command to deploy:

kubectl apply -f nginx-deployment.yaml

Run this command to check an IP of a pod and a node in which the pod is residing:

kubectl get pods -o wide

Step 4: Test pod on egress/ingress limits

After we deploy the application without specifying bandwidth limits, we use the `tc` command to check the current `qdisc` in the EC2 instance.

  • `qdisc` (Queuing discipline) is an algorithm that manages the way packets are queued and scheduled for transmission on a network interface. It determines the order in which packets are sent out from the kernel’s packet queues to the network interface card.

Run this command to check `qdisc`:

tc qdisc show

Output:

The `qdisc pfifo_fast` uses a simple First-In, First-Out (FIFO) queue and doesn’t perform traffic shaping or prioritization.

qdisc noqueue 0: dev lo root refcnt 2
qdisc mq 0: dev eth0 root
qdisc pfifo_fast 0: dev eth0 parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth0 parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth0 parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth0 parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc noqueue 0: dev eniaafbc306ee3 root refcnt 2
qdisc noqueue 0: dev eni8abfd912174 root refcnt 2
qdisc mq 0: dev eth1 root
qdisc pfifo_fast 0: dev eth1 parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc noqueue 0: dev pod-id-link0 root refcnt 2
qdisc noqueue 0: dev enifb88ae1f006 root refcnt 2

Next, we use `iperf` to perform the bandwidth measurement. Run this command to measure the maximum achievable bandwidth:

Replace {POD_IP} with the IP that you get from Step 3.

iperf -c {POD_IP} -p 8080 -i 1

Output:

------------------------------------------------------------
Client connecting to 192.168.113.250, TCP port 80
TCP window size: 3.90 MByte (default)
------------------------------------------------------------
[  3] local 192.168.137.88 port 51026 connected with 192.168.113.250 port 80
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   594 MBytes  4.98 Gbits/sec
[  3]  1.0- 2.0 sec   592 MBytes  4.97 Gbits/sec
[  3]  2.0- 3.0 sec   592 MBytes  4.97 Gbits/sec
[  3]  3.0- 4.0 sec   592 MBytes  4.96 Gbits/sec
[  3]  4.0- 5.0 sec   592 MBytes  4.96 Gbits/sec
[  3]  5.0- 6.0 sec   592 MBytes  4.96 Gbits/sec
[  3]  6.0- 7.0 sec   592 MBytes  4.97 Gbits/sec
[  3]  7.0- 8.0 sec   592 MBytes  4.96 Gbits/sec
[  3]  8.0- 9.0 sec   592 MBytes  4.96 Gbits/sec
[  3]  9.0-10.0 sec   591 MBytes  4.96 Gbits/sec
[  3]  0.0-10.0 sec  5.78 GBytes  4.97 Gbits/sec

Step 5: Re-deploy the pods with bandwidth restriction

After testing the bandwidth of the pod without egress and ingress limits, we add the following annotations to the deployment to specify the bandwidth limits of egress and ingress:

  • `kubernetes.io/ingress-bandwidth` – To control ingress bandwidth
  • `kubernetes.io/egress-bandwidth` – To control egress bandwidth

Update the manifest in `nginx-deployment.yaml` and re-deploy with the same command:

cat << EOF > nginx-deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        kubernetes.io/ingress-bandwidth: 1G
        kubernetes.io/egress-bandwidth: 1G
    spec:
      securityContext:
        runAsNonRoot: true
      containers:
        - name: nginx
          image: nginxinc/nginx-unprivileged
          ports:
            - containerPort: 8080
          securityContext:
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
          volumeMounts:
            - mountPath: /tmp
              name: tmp
      volumes:
        - emptyDir: {}
          name: tmp
            
EOF

Re-deploy the application again:

kubectl apply -f nginx-deployment.yaml

Step 6: Test pod on egress/ingress limits

After we re-deploy the updated manifest with ingress and egress bandwidth limits, we repeat the same procedure with Step 4 to make sure our new configuration is now effective.

Run this command to check `qdisc`:

tc qdisc show

Output:

qdisc noqueue 0: dev lo root refcnt 2
qdisc mq 0: dev eth0 root
qdisc pfifo_fast 0: dev eth0 parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth0 parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth0 parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth0 parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc noqueue 0: dev eniaafbc306ee3 root refcnt 2
qdisc noqueue 0: dev eni8abfd912174 root refcnt 2
qdisc mq 0: dev eth1 root
qdisc pfifo_fast 0: dev eth1 parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev eth1 parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc noqueue 0: dev pod-id-link0 root refcnt 2
qdisc tbf 1: dev enif603d342b24 root refcnt 2 rate 1Gbit burst 512Mb lat 25ms
qdisc ingress ffff: dev enif603d342b24 parent ffff:fff1 ----------------
qdisc tbf 1: dev bwp3163ee8c94ce root refcnt 2 rate 1Gbit burst 512Mb lat 25ms

As you can see from the output, `qdisc` comes up with `tbf` (Token Bucket Filter), which is a classful queueing discipline available for traffic control.

Run this command to measure the maximum achievable bandwidth:

iperf -c {POD_IP} -p 8080 -i 1

Output:

------------------------------------------------------------
Client connecting to 192.168.125.111, TCP port 80
TCP window size: 1.68 MByte (default)
------------------------------------------------------------
[  3] local 192.168.137.88 port 43566 connected with 192.168.125.111 port 80
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   594 MBytes  4.98 Gbits/sec
[  3]  1.0- 2.0 sec   149 MBytes  1.25 Gbits/sec
[  3]  2.0- 3.0 sec   118 MBytes   989 Mbits/sec
[  3]  3.0- 4.0 sec   118 MBytes   990 Mbits/sec
[  3]  4.0- 5.0 sec   119 MBytes   998 Mbits/sec
[  3]  5.0- 6.0 sec   118 MBytes   988 Mbits/sec
[  3]  6.0- 7.0 sec   119 MBytes   998 Mbits/sec
[  3]  7.0- 8.0 sec   118 MBytes   989 Mbits/sec
[  3]  8.0- 9.0 sec   118 MBytes   987 Mbits/sec
[  3]  9.0-10.0 sec   119 MBytes  1.00 Gbits/sec
[  3]  0.0-10.0 sec  1.65 GBytes  1.42 Gbits/sec

Before and After:

The following visualization shows the bandwidth in Gbits/sec. The orange line represents “Before” we added the bandwidth annotation to the deployment. The blue line represents “After” we set the bandwidth annotation.

Cleaning up

Delete the deployment:

kubectl delete -f nginx-deployment.yaml

To delete your EKS cluster provisioned in Step 0:

eksctl delete cluster --name dev-cluster

Consideration

The bandwidth plugin is not compatible with Amazon VPC CNI based Network policy at the time of writing this post. Network Policy agent uses the Traffic Classifier (TC) system to enforce configured network policies for the pods. The policy enforcement fails if the bandwidth plugin is enabled due to conflict between the TC configuration of the bandwidth plugin and the Network policy agent. We’re exploring options to support the bandwidth plugin along with the Network policy feature, and the issue is tracked through this AWS GitHub issue.

Conclusion

In this post, we showed you how we can use the Amazon VPC CNI plugin and its capabilities to limit ingress and egress bandwidth for applications running as pods in Amazon EKS. By using this, users can implement a functionality to restrict their pods usage on the network and prevent network starvation due to huge network consumption from neighbor pods in a Kubernetes cluster.