Containers

Amazon EKS cluster automation with GitLab CI/CD

Introduction:

Container technologies are becoming increasingly popular for customers developing new applications and migrating workloads to the cloud. One of the most common approaches to managing containers at scale is to utilize an orchestration technology such as AWS Elastic Kubernetes Service (EKS). However, building a repeatable and reliable Kubernetes cluster in a multi-account (or multi-region) environment can be challenging.

Using Continuous Integration/Continuous Deployment (CI/CD) for EKS cluster building and configuration can simplify the software and infrastructure release cycle. Moreover, the creation of on demand ephemeral environments can facilitate development, testing, continuous integration and quality assurance. In this article, we will provide some examples and guidance on how to automate different stages of creating an EKS cluster and also analyze some of the challenges.

Solution overview:

The diagram below provides a reference design for the GitLab CI/CD solution covered in this article. At a high level, this includes three stages within the CI/CD pipeline: template generation, cluster creation, and cluster configuration.

Orchestration:

There are a number of technologies that can be used to orchestrate an EKS cluster build so it is important to consider a few key concepts before designing your delivery mechanism.

The stages of a pipeline are normally executed sequentially but the steps within a given stage are generally executed in parallel. This concept provides a nice separation between three macro stages of the pipeline: setup, build, and configuration. This article references GitLab CI/CD with examples of code snippets but each step of the automation logic could easily be recreated with any modern CI/CD offering.

Environment setup and prerequisites

Before creating an Amazon EKS cluster, there are some networking requirements that need to be considered. In this article, we are going to assume you already have an AWS account setup with a VPC, internet connectivity, and DNS resolution enabled in your VPC. Given the complexity of the topic, please refer to the official documentation for a complete overview of Amazon EKS networking as it will not be covered in detail in this article. We also assume a basic understanding of GitLab CI configuration syntax.

Building and configuring an EKS cluster can require different tools and libraries. To simplify the dependencies management, all of the required tools can be packaged in a Docker container. Most of the modern CI/CD and orchestration tools allow execution of pipelines and configurations from a container. This removes the need for specialized pipeline executors and also helps to keep all tools in a single definition.

Note: A typical EKS cluster build can take more than 10 minutes. It is good practice to check that all binaries and tools are installed and configured on the runner/slave (or container) before building the cluster. This will help you avoid errors at a later stage of the pipeline and it allows you to record the plugin versions for audit trail. See a GitLab CI pipeline example below:

CheckBinaries:
    stage: environment_setup
    script:
      - aws --version
      - eksctl version
      - python3 --version
      - pip3 list
      - kubectl version --client
      - helm version

Multi account build:

In a large organization with multiple AWS accounts, for your CI/CD solution, we recommend using a dedicated AWS account for CI/CD to provide better security controls and contain the blast radius. However, this can create some complexity around authentication and IAM credentials.

To work around this, we can leverage AWS STS to generate a set of temporary credentials that we can use inside a simple bash script to assume a role in a different AWS account passing the AWS region, and the role to be assumed (see example below).

In this way, the same cluster creation and configuration pipeline can be executed in multiple target accounts.

REGION=$1
ROLE=$2

echo "===== assuming permissions => $ROLE ====="
KST=(`aws sts assume-role --role-arn $ROLE --role-session-name "ClusterDeploy" --query '[Credentials.AccessKeyId,Credentials.SecretAccessKey,Credentials.SessionToken]' --output text`)
unset AWS_SECURITY_TOKEN
export AWS_DEFAULT_REGION=$REGION
export AWS_ACCESS_KEY_ID=${KST[0]}
export AWS_SECRET_ACCESS_KEY=${KST[1]}
export AWS_SESSION_TOKEN=${KST[2]}
export AWS_SECURITY_TOKEN=${KST[2]}

In GItLab CI, the before_script is used to define a command that should be run before each job. In this case, we can configure each stage to use this bash script to assume a role in a target account.

SetupCredentials:
    stage: environment_setup
    before_script:
      - source assume_role.sh <aws region> <role to be assumed>
    script:
      - aws sts get-caller-identity
      - <some other command>

Note: this requires sts:AssumeRole permission for your GitLab runner. Please refer to the documentation for how to set up multi-account roles.

EKS cluster creation

Eksctl is a simple command line inferface for creating and managing Kubernetes clusters on Amazon EKS. The binary accepts arguments and parameters via the Command Line Interface (CLI). However, it can be difficult to manage more than a handful of parameters, particularly across different builds. A more programmatic approach would be using config files although this can create the problem of managing different templates across environments. A number of templates can be created and stored in the VCS (Version Control System) solution of choice, but it is recommended that some patterns are defined, and templates are generated dynamically using conditions such as environment, cluster size, and hosted application(s).

In this example, we use a python library named Jinja2 to replace two sections of the template. This allows us to bake some logic into the template and render a file dynamically based on predefined configuration snippets. We can then define different configurations (e.g. node type in the snippet below) or build more complex logic around multi-environment pipelines to differentiate the templates across environments (e.g. development and production).

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: {{cluster_name}}
  region: eu-west-2
  version: "1.16"
  
[...]
{% if node_type == 'byon' -%}
nodeGroups:
  - name: ng-1-workers
    labels: { role: workers }
    instanceType: m5.large
    desiredCapacity: 3
    minSize: 2
    maxSize: 6
[...]
{% elif node_type == 'managed' %}
managedNodeGroups:
  - name: managed-ng-1
    labels: {role: worker}
    instanceType: t3.large
    minSize: 3
    maxSize: 10
[...]
[...]
with open("cluster_config.j2") as file_:
    template = Template(file_.read())

with open("eks_template.yaml", "w") as fh:
    fh.write(
        template.render(cluster_name=args.cluster_name, node_type=args.node_type)
    )
[...]

The script can then be called inside the pipeline with a minimal number of arguments to generate the EKSCTL template on demand.

TemplateCreation:
    stage: template_creation
    script:
      - python3 gen_template.py -nt $NODE_TYPE -n $CLUSTER_NAME
    artifacts:
      expose_as: 'EKSCTL TEMPLATE'
      paths: ['eks_template.yaml']

Note: In the snippet above, the pipeline is instructed to save the file as a build artifact and store it to be used across different pipelines or for further reference.

We can then leverage Eksctl to create the cluster using the template we generated:

ClusterCreation:
    stage: cluster_creation
    script:
      - if ! eksctl get cluster -n $CLUSTER_NAME ;then eksctl create cluster -f eks_template.yaml ; else eksctl update cluster -f  eks_template.yaml ; fi

Cluster authentication:

When an Amazon EKS cluster is created, the Identity Access Management (IAM) entity that creates the cluster is added to the Kubernetes RBAC authorization table as the administrator (with system:master permissions). This entity could be a user or a role.

The traditional CI/CD systems normally use a different set of credentials, so it is good practice to add at least one administrator role to the aws-auth config map. We can leverage Eksctl to add one or more roles as part of the pipeline.

SetRBAC:
    stage: cluster_configuration
    script:
      - eksctl create iamidentitymapping --cluster  $CLUSTER_NAME --arn <role ARN> --group system:masters --username admin
    only:
      - master
    when: on_success

The AWS CLI allows you to build a configuration file for kubectl with prepopulated server and certificate authority data values for a specified cluster. We can specify an IAM role ARN with the role-arn option for authentication when you issue kubectl commands (e.g. kubectl apply or kubectl get). This allows us to use common CLI tools such as HELM or kubectl and configure them on demand in the pipeline executor.

Cluster configuration:

When it comes to cluster configuration, there is no cookie-cutter approach. There are a few options to consider.

Option 1: Install and configure plugins using the native kubectl integration.

For example:

Ingress:
    stage: cluster_configuration
    before_script:
      - aws eks --region $AWS_REGION update-kubeconfig --name $CLUSTER_NAME
    script:
        - kubectl apply -f https://raw.githubusercontent.com/kubernetes/examplechart.yaml
        - helm install stable/example --generate-name

Option 2: Take a programmatic approach

Depending on your requirements, you may want to consider defining a baseline so that your plugins and configurations are applied to every environment you create.

Many common add-ons for Kubernetes are available via HELM; we can define a set of releases we want to install and leverage. For example, the “HELM operator,” which is a Kubernetes operator that allows you to declaratively manage Helm chart releases.

This introduces the concept of “Helm Release,” a Kubernetes custom resource that allows us to define our charts and configurations using a known syntax instead of running a series of helm install. The operator not only creates the release automatically, but also ensures that the state of the release doesn’t diverge from the definition in the code by running a reconciliation loop across all the releases.

Example of an HelmRelease definition:

---
apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: prometheus
  namespace: monitoring
spec:
  releaseName: prometheus
  chart:
    repository: https://kubernetes-charts.storage.googleapis.com/
    name: prometheus
    version: 11.1.4
  values:
    alertmanager:
      persistentVolume:
        storageClass: gp2
        size: 10Gi
    server:
      persistentVolume:
        size: 100Gi
        storageClass: gp2

The GitLab interface

The GitLab web interface can help us to obtain an overview of the entire build.

The GitLab web interface can also help us to monitor the previous pipeline executions, showing Status, Commit, and Duration.

The details of each stage and logs can be seen clicking on the stage name.

Conclusions

By using CD/CD or orchestration software, we can build and automate a number of manual tasks and minimize the need for scripting. This article has provided an overview of some examples, stages, and steps that can be followed to optimize the configuration and logic around your EKS cluster creation. This will not only ensure your cluster configuration is consistent, but also reduce the maintenance overhead of running multiple EKS clusters and help to enforce a standard configuration across the different environments.