Containers

Transparent encryption of node to node traffic on Amazon EKS using WireGuard and Cilium

Introduction

As the move to cloud native architectures continues to accelerate, one of the common challenges we hear from our customers is that adopting security best practices in Kubernetes clusters can be challenging. One area in particular that has come up in conversations often is how best to encrypt data in transit. This data encryption requirement is often driven by regulatory and compliance requirements. It’s also important to note that data encryption is often considered a critical component of a defense-in-depth security strategy.

Encrypting data in transit

Protecting sensitive data is one of the design principals of the AWS Well-Architected framework’s Security Pillar. In addition to stringent access control strategies guided by least privilege, AWS recommends encrypting data both in transit and at rest. Encryption at rest for Kubernetes environments is straightforward and easily adopted by our customers. We have provided encryption at rest best practices for AWS and specific guidance for Amazon EKS customers. However, encrypting data in transit for Kubernetes cluster traffic requires some additional effort. Let’s quickly review common network security options in Kubernetes.

While networking is fundamental to Kubernetes, core networking functions aren’t part of the platform and instead implemented by third-party plugins that conform to the Container Network Interface (CNI) specification. A key Kubernetes design decision was that each Pod receives a unique IP address and can communicate with any other Pod in the cluster. Network security in Kubernetes is implemented by Network Policy, which allows or denies traffic within the cluster. There is no native Kubernetes feature to encrypt data in transit. Customers have traditionally accomplished automatic network traffic encryption by leveraging features of a Service Mesh. Examples of this include AWS App Mesh and Istio. It’s also important to note that certain Amazon EC2 instance types automatically encrypt traffic within the same VPC.

We’ve heard from a number of customers that would like to leverage automatic encryption of data in transit but either don’t require a Service Mesh or don’t want to add the complexity of a Service Mesh to their cluster. In this post, we explore a lighter weight option that provides encryption for data in transit that is built into the newer versions of the Linux kernel.

WireGuard

WireGuard is an open source, lightweight, and secure Virtual Private Network (VPN) solution that is built into the Linux kernel starting in version 5.6. WireGuard was initially released only for Linux, but it’s now available cross platform and has even been back-ported to support Linux kernels earlier than v5.6. One of the unique strengths of the WireGuard implementation is the simplicity of the implementation. According to the WireGuard whitepaper, the Linux implementation can be accomplished in less than 4,000 lines of code. This makes the solution easily auditable with a minimal attack surface. Linus Torvalds has called WireGuard “a work of art” compared to OpenVPN and IPSec, which have codebases with hundreds of thousands of lines of code.

WireGuard has native support for containers and offers transparent encryption of application traffic running on Kubernetes clusters. Application owners don’t have to configure or manage any encryption keys or certificates making the implementation seamless and automatic. Compared to Kubernetes services meshes, WireGuard is a much simpler implementation that doesn’t require a service mesh control plane or sidecar containers. WireGuard is also better performant and less resource intensive than a service mesh.

There are a number of ways to integrate WireGuard with Kubernetes. Popular Kubernetes CNI plugins Calico and Cilium have added support for WireGuard. There is also a network overlay Kilo works with any CNI plugin. For this post, we chose Cilium.

Cilium

Cilium is a networking, observability, and security solution that offers support for Kubernetes by providing a CNI plugin. Cilium is also the default network and security layer for Amazon EKS Anywhere, an on-premises deployment option for Amazon EKS. Starting with Cilium v1.10 released in May 2021, support for WireGuard was added to enable transparent encryption for Kubernetes pods. The Cilium agent uses WireGuard to create a secure connection between each node in the Kubernetes cluster. Traffic from Pods that leave the node is automatically and transparently encrypted.

It’s important to note that traffic between Pods on the same host are not encrypted. The Cilium team made this decision intentionally because if privilege exists to view traffic on the node, it is possible to view the raw, unencrypted traffic anyway.

Solution overview

To enable transparent with encryption with WireGuard and Cilium on Amazon EKS, we’ll need:

  • Linux worker nodes with WireGuard enabled (version 5.6 and later has WireGuard support built-in)
  • Cilium CNI version v1.10 or later installed
  • Cilium’s optional WireGuard support enabled

At the time this post was written, the Amazon EKS Optmized Amazon Linux AMI uses a Linux 5.4 kernel that doesn’t include WireGuard. While you could install a back-ported kernel module, we’ve chosen to use the Amazon EKS Optmized Bottlerocket AMI, which comes with the Linux 5.10 kernel.

Cilium can be used alongside the Amazon VPC CNI using an advanced configuration option called CNI Chaining. We chose this option where the Amazon VPC CNI is used for IP address managed (IPAM) and the core network connectivity and Cilium provides advanced network features including the transparent encryption with WireGuard.

Walkthrough

This blog will walk you through the following steps:

  1. Create the Amazon EKS cluster with two Bottlerocket worker nodes
  2. Install Cilium with WireGuard enabled
  3. Deploy client and server pods on separate nodes
  4. Verify node to node encryption
  5. Cleaning up

Prerequisites

We need a few tools to set up our demo. Ensure you have each of the following tools in your working environment:

This post uses shell variables to make it easier to substitute the actual names for your deployment. When you see placeholders like NAME=<your name>, substitute in the name for your environment.

Create the Amazon EKS cluster with two Bottlerocket worker nodes

First set up the AWS region variable. Replace <your region> with your own value below.

export AWS_REGION=<your region>

Then, run the following to create an Amazon EKS cluster with a node group using Bottlerocket OS.

cat << EOF > clusterconfig.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: wireguard-blog
  region: $AWS_REGION

iam:
  withOIDC: true
  
addons:
- name: vpc-cni

nodeGroups:
- name: bottlerocket
  instanceType: t3.medium
  desiredCapacity: 2
  amiFamily: Bottlerocket
  iam:
    attachPolicyARNs:
    - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
    - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
    - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
EOF
 
eksctl create cluster -f clusterconfig.yaml

It will take approximately 15–20 minutes to provision the Amazon EKS cluster along with the Bottlerocket based Managed Node Group.

Install Cilium with WireGuard enabled

In this section we will install Cilium which will be chained with VPC CNI. For Cilium WireGuard installation we will use Helm. To setup Helm repository, run:

helm repo add cilium https://helm.cilium.io/
helm repo update

To install Cilium with WireGuard, run the following:

helm install cilium cilium/cilium --version 1.12.2 \
  --namespace kube-system \
  --set cni.chainingMode=aws-cni \
  --set enableIPv4Masquerade=false \
  --set tunnel=disabled \
  --set endpointRoutes.enabled=true \
  --set encryption.enabled=true \
  --set encryption.type=wireguard \
  --set l7Proxy=false 

To validate the set up, execute the cilium status command in one of the Cilium pods by running:

kubectl -n kube-system exec -it ds/cilium -- cilium status | grep Encryption 

You should see something like this:

Encryption: Wireguard [cilium_wg0 (Pubkey: <snip>, Port: 51871, Peers: 1)]

WireGuard is enabled and the cilium_wg0 tunnel device encrypts all traffic that flows to other worker nodes. Now let’s deploy a client and server pods and inspect node to node traffic.

Deploy client and server pods on separate nodes

For the server pod we will use Nginx. Run the following to deploy the Nginx server pod:

cat << EOF > server-pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: server
  labels:
    blog: wireguard
    name: server
spec:
  containers:
    - name: server
      image: nginx
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: "kubernetes.io/hostname"
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        blog: wireguard
---
apiVersion: v1
kind: Service
metadata:
  name: server
spec:
  selector:
    name: server
  ports:
  - port: 80
EOF

kubectl apply -f server-pod.yaml

For the client Pod we will use a BusyBox container to connect the server every 2 seconds using the command watch wget server . Run the following to deploy the BusyBox client pod:

cat << EOF > client-pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: client
  labels:
    blog: wireguard
    name: client
spec:
  containers:
    - name: client
      image: busybox
      command: ["watch", "wget", "server"]
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: "kubernetes.io/hostname"
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        blog: wireguard
EOF

kubectl apply -f client-pod.yaml

The client and server pods will be running on separate nodes due to the Pod Topology Spread Constraints. To ensure this is the case, run:

kubectl get pod -o wide

You should see output similar to the following information. Under NODE column, you should see the client and server pods are scheduled on different nodes.

NAME     READY   STATUS    RESTARTS   AGE     IP                NODE                                           NOMINATED NODE   READINESS GATES
client   1/1     Running   0          6m49s   192.168.116.243   ip-192-168-14-85.us-west-2.compute.internal    <none>           <none>
server   1/1     Running   0          19m     192.168.144.92    ip-192-168-75-205.us-west-2.compute.internal   <none>           <none>

Verify node to node encryption

Now, verify the traffic between the client and server Pods is encrypted. To do this, we will inspect traffic on the WireGuard tunnel device cilium_wg0. Run the commands below to execute a bash shell on the Cilium Pod.

kubectl -n kube-system exec -ti ds/cilium -- bash

We’ll use tcpdump to verify the traffic is passing through the cilium_wg0 tunnel device and automatically and transparently encrypted using WireGuard.

apt-get update
apt-get -y install tcpdump
tcpdump -n -i cilium_wg0

You’ll see output like below:

listening on cilium_wg0, link-type RAW (Raw IP), capture size 262144 bytes
02:12:54.679643 IP 192.168.116.243.55486 > 192.168.144.92.80: Flags [S], seq 3687345076, win 62727, options [mss 8961,sackOK,TS val 647076876 ecr 0,nop,wscale 7], length 0
02:12:54.681223 IP 192.168.144.92.80 > 192.168.116.243.55486: Flags [S.], seq 42389621, ack 3687345077, win 62643, options [mss 8961,sackOK,TS val 1674308618 ecr 647076876,nop,wscale 7], length 0
02:12:54.681288 IP 192.168.116.243.55486 > 192.168.144.92.80: Flags [.], ack 1, win 491, options [nop,nop,TS val 647076878 ecr 1674308618], length 0
02:12:54.681861 IP 192.168.116.243.55486 > 192.168.144.92.80: Flags [P.], seq 1:79, ack 1, win 491, options [nop,nop,TS val 647076879 ecr 1674308618], length 78: HTTP: GET / HTTP/1.1
02:12:54.682383 IP 192.168.144.92.80 > 192.168.116.243.55486: Flags [.], ack 79, win 489, options [nop,nop,TS val 1674308619 ecr 647076879], length 0
02:12:54.682594 IP 192.168.144.92.80 > 192.168.116.243.55486: Flags [P.], seq 1:239, ack 79, win 489, options [nop,nop,TS val 1674308619 ecr 647076879], length 238: HTTP: HTTP/1.1 200 OK
02:12:54.682624 IP 192.168.116.243.55486 > 192.168.144.92.80: Flags [.], ack 239, win 490, options [nop,nop,TS val 647076879 ecr 1674308619], length 0
02:12:54.682664 IP 192.168.144.92.80 > 192.168.116.243.55486: Flags [P.], seq 239:854, ack 79, win 489, options [nop,nop,TS val 1674308619 ecr 647076879], length 615: HTTP
02:12:54.682680 IP 192.168.116.243.55486 > 192.168.144.92.80: Flags [.], ack 854, win 486, options [nop,nop,TS val 647076879 ecr 1674308619], length 0
02:12:54.683288 IP 192.168.116.243.55486 > 192.168.144.92.80: Flags [F.], seq 79, ack 854, win 486, options [nop,nop,TS val 647076880 ecr 1674308619], length 0
02:12:54.683829 IP 192.168.144.92.80 > 192.168.116.243.55486: Flags [F.], seq 854, ack 80, win 489, options [nop,nop,TS val 1674308620 ecr 647076880], length 0
02:12:54.683878 IP 192.168.116.243.55486 > 192.168.144.92.80: Flags [.], ack 855, win 486, options [nop,nop,TS val 647076881 ecr 1674308620], length 0

Congrats! The node to node traffic is flowing through the WireGuard tunnel using transparent encryption.

Cleaning up

To avoid incurring future charges, delete the Amazon EKS cluster resources you created by following the instructions below.

eksctl delete cluster -f clusterconfig.yaml

Conclusion

In this post, we discussed the importance of protecting sensitive data and why some companies are investigating ways to automatically secure and encrypt data in transit. We introduced WireGuard, a lightweight encryption solution that is now built into recent version of the Linux Kernel. To implement WireGuard in an Amazon EKS Cluster, we installed and configured Cilium to work with the Amazon VPC CNI and use WireGuard to encrypt all node-to-node traffic. For more details on configuring and troubleshooting WireGuard with Cilium, please refer to the Cilium documentation.