AWS Open Source Blog
Amazon VPC CNI Plugin Version 1.1 Now Available
The Amazon VPC Container Networking Interface (CNI) plugin allows Kubernetes pods to receive native AWS VPC IP addresses. Because the CNI plugin is a core part of Amazon Elastic Container Service for Kubernetes (EKS), the EKS team will continue to develop the project in collaboration with our partners and customers.
Today, we are releasing version 1.1 of the Amazon VPC CNI plugin. This update introduces the ability to disable source NAT for pods, adds the ability to configure pre-allocation of secondary IP addresses, ensures that CNI plugin daemons are scheduled on all nodes in a cluster, adds elastic network interface (ENI) resource tagging, and more. Starting today, all new EKS clusters will automatically schedule the aws-node daemonset with version 1.1 of the CNI plugin. If you have an existing EKS cluster, you’ll need to update the “aws-node” daemonset to use the new version of the CNI plugin.
Let’s take a look at what’s included in this release and how to update the CNI plugin to version 1.1 for existing EKS clusters.
Features and Bug Fixes in Version 1.1
Ability to Disable Source Address Translation for Pods
The CNI plugin works by allocating multiple ENIs to EC2 instances, and then attaches secondary IP addresses to these ENIs. This allows the CNI to allocate as many IPs per instance as possible.
By default, the CNI configures pods with Source Network Address Translation (SNAT) enabled, which sets the return address for a packet to the primary public IP of the instance, to allow for communication with the internet. This way, when you use an AWS Internet Gateway and a public address, the return packet can be routed to the correct EC2 instance. Leave SNAT enabled (AWS_VPC_K8S_CNI_EXTERNALSNAT = False by default) if you want your pods to run in a public subnet and communicate with the internet through an internet gateway.
SNAT can cause issues, however, if traffic from another private IP space (e.g., over VPC peering, Transit VPC, or Direct Connect) attempts to communicate directly to a pod that is not attached to the primary ENI. To declare that NAT will be handled by an external device (such as an AWS NAT Gateway, i.e. not on the instance itself), you can disable Source Network Address Translation (SNAT) on the instance with a new environment variable, AWS_VPC_K8S_CNI_EXTERNALSNAT — set its value to “true”.
Disable SNAT if you need to allow inbound communication to your pods from external VPNs, direct connections, and external VPCs, and your pods do not need to access the Internet directly via an IGW. In other words, disabling SNAT is incompatible with nodes running in a public subnet; your nodes need to run in a private subnet and connect to the internet through an AWS NAT Gateway or another external NAT device.
To read more about the issue this fix addresses, read this GitHub issue about routing outside the VPC.
Network Diagrams:
Configurable IP Address Pool
Today, the EKS CNI plugin creates a “warm pool” of IP addresses by pre-allocating IP addresses on EKS nodes to reduce scheduling latency. In other words: because the instance already has IP addresses allocated to it, Kubernetes doesn’t need to wait for an IP address to be assigned before it can schedule a pod. However, there are some tradeoffs in this approach: if your EKS nodes are larger instance types and can support larger numbers of IP addresses, you might find that your nodes are hogging more IP addresses than you want.
You can use the WARM_IP_TARGET environment variable to tune the size of the IP address “warm pool.” You can define a threshold for available IP addresses below which L-IPAMD creates and attaches a new ENI to a node, allocates new IP addresses, and then adds them to the warm pool. This threshold can be configured using the WARM_IP_TARGET environment variable; it can also be configured in amazon-vpc-cni.yaml.
For example, an m4.4xlarge node can have up to 8 ENIs, and each ENI can have up to 30 IP addresses. This means that the m4.4xlarge could reserve up to 240 IP addresses from your VPC CIDR for its warm pool, even if there are no pods scheduled. Changing the WARM_IP_TARGET to a lower number will reduce how many IPs the node has attached, but if your number of pods scheduled exceeds the WARM_IP_TARGET, additional pod launches will require an EC2 AssignPrivateIpAddresses() API call, which can add latency to your pod startup times.
This parameter allows you to perform a balancing act. We recommend tuning it based on your pod launch needs: how many pods do you need to schedule and how fast do you need them to start up, versus how much of your VPC IP space you’d like your EKS nodes to occupy.
CNI Plugin Daemons Scheduled to All Nodes
Previously, node taints like NoExecute and NoSchedule would prevent daemon pods, such as aws-node, calico-node, and calico-typha, from being scheduled to each node in the cluster. Now, these daemons are always scheduled to all nodes.
ENI Tagging
With this enhancement, the CNI plugin adds two new tags, node.k8s.amazonaws.com/instance_id and cluster.k8s.amazonaws.com/name, to the ENIs it creates, for easier identification and filtering.
If the environment variable CLUSTER_NAME is not set, only the node.k8s.amazonaws.com/instance_id tag is set. For example:
node.k8s.amazonaws.com/instance_id = i-0bf1f4f8e688b70fd
If CLUSTER_NAME is set, then both tags will be set. For example, if CLUSTER_NAME=mycluster
node.k8s.amazonaws.com/instance_id = i-0bf1f4f8e688b70fd cluster.k8s.amazonaws.com/name = mycluster-1531613390
Release Pod IP on CNI Failure
This fixes a scenario in which a pod remains in a permanent ContainerCreating state and thus cannot be deleted if the CNI Plugin fails to set up the network stack for it for any reason. By releasing the IP address so the L-IPAM daemon can reclaim it, the node kubelet is then able to initiate a pod delete request.
Updating the CNI plugin
If you have an existing EKS cluster, it’s likely you’re running version 1.0 of the CNI plugin and you’ll need to manually update the CNI plugin to version 1.1. To check the version you’re currently running, you can run the following command:
kubectl describe daemonset aws-node --namespace kube-system | grep Image | cut -d "/" -f 2
All EKS clusters created from today forward (22:00 UTC July 26th, 2018) will automatically schedule the aws-node daemonset with 1.1 of the CNI plugin. If you prefer, you can recreate your cluster instead of updating.
To upgrade, run the following command:
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/master/config/v1.1/aws-k8s-cni.yaml
Next Steps
For more information about pod networking in Amazon EKS, please see the Amazon EKS Documentation.
For more information about the changes to the CNI Plugin, see the changelog on the GitHub repository.
We encourage you to create issues and submit PRs to the AWS CNI Plugin GitHub repository. We look forward to seeing your contributions in future releases of the CNI plugin!