Containers

Introducing cdk8s+: Intent-driven APIs for Kubernetes objects

At AWS, we’ve been exploring new approaches of making it easier to define Kubernetes applications. Last month, we announced the alpha release of cdk8s, an open-source project that enables you to use general purpose programming languages to synthesize manifests.

Today, I would like to tell you about cdk8s+ (cdk8s-plus), which we believe is the natural next step for this project.

cdk8s+ is a library built on top of cdk8s. It is a rich, intent-based class library for using the core Kubernetes API. It includes hand crafted constructs that map to native Kubernetes objects, and expose a richer API with reduced complexity.

To give you an idea of what I mean, here is how you’d define a Deployment and expose it on port 8000 via a Service:

const deployment = new kplus.Deployment(this, 'MyApp', {
  spec: {
    replicas: 3,
    podSpecTemplate: {
      containers: [new kplus.Container({
        image: 'node',
        port: 9000
      })],
    },
  },
});

// this will internally create a service
deployment.expose({port: 8000});

Notice how we didn’t have to configure any selectors, nor did we have to specify the internal port used by the container when exposing our deployment. The snippet above will generate the following YAML manifest:

kind: Deployment
apiVersion: apps/v1
spec:
  replicas: 3
  selector:
    matchLabels:
      cdk8s.deployment: MyAppC6A88652
  template:
    metadata:
      labels:
        cdk8s.deployment: MyAppC6A88652
    spec: <pod-spec-ommitted-for-brevity>
---
kind: Service
apiVersion: v1
spec:
  type: ClusterIP
  ports:
    - port: 8000
      targetPort: 9000 # this is the port exposed by container.
  selector:
    cdk8s.deployment: MyAppC6A88652

Later on, we will dive deeper into the API and the considerations we made building it.

cdk8s+ is in alpha

Note: the library is in very early stages of development. As such, it may be lacking substantial features as well as introduce breaking changes between updates. Use it with care and at your own discretion.

All breaking and non breaking changes will be published in the CHANGELOG.

Getting started

Head over to our GitHub repo and try it out. You’ll find documentation for all the available constructs, as well as a full API spec. We would love to hear what you think is missing, and if you so choose, actively participate in the development.

The library is currently available for Typescript and Python, with more languages coming soon. Also note that the generated manifests are completely agnostic to the cloud provider you are using. It produces pure Kuberenetes files that can be applied to any cluster.

Diving deep

This blog will show you how to deploy a real world Kubernetes application using cdk8s+. In order to get a complete picture of its benefits, we will actually develop the application in 3 different ways; First by directly authoring a YAML manifest, then by using a programming language (Typescript) and cdk8s to generate a manifest, and finally we will use the rich API provided by cdk8s+.

Before we start, let’s introduce a few guiding principles that will help us navigate through the different approaches.

  • Desired State: We’d like our application definition to be solely based on a desired state configuration. This is necessary in order to apply infrastructure-as-code best practices, as well as enable GitOps workflows.
  • Don’t Repeat Yourself (DRY): Avoid having to repeat any value or definition in multiple locations. This makes our desired state much less sensitive to change.
  • Boilerplate: Information that can be inferred, should be inferred. Having to repeatedly apply common configurations makes our application overly complex and more error prone.
  • Cognitive Load: Ideally, we should be able to write our application without exactly remembering how to configure each resource. We want the tools to guide us.
  • Reusability: Once our application is done, we’d like for it to be easy to share our work with others.

We will go back to these guidelines throughout this post, and see how each approach addresses them.

Okay, we are now ready to get started. First, lets describe our application:

Construct Catalog Search

Those of your familiar with the constructs Ecosystem, might have already encountered awscdk.io. It’s a website for discovering constructs and is maintained as an open-source project at https://github.com/construct-catalog/catalog. Today, the catalog simply posts a tweet every time a new CDK construct is published. It then uses Twitter itself as somewhat of a search engine.

If you’re looking for information on how to publish your own construct library, check out Publishing Modules.

We’d like the catalog to provide a “real” search experience with filtering and aggregation capabilities. To do that, every time a new library is published, we are going to index an event to an Elasticsearch cluster, using an Amazon SQS queue in the middle. In addition, we will expose an endpoint that will perform a free text ES query.

So, our application has two components:

  1. Query Server: An http server accepting requests and performing Elasticsearch queries.  (query.js)
  2. Indexer Worker: A long running poller process that fetches messages from the queue and indexes them to Elasticsearch. (indexer.js)

As inputs, our app will accept a QUEUE_URL and an ELASTICSEARCH_ENDPOINT env variable.

Note that the Elasticsearch cluster is actually created with cdk8s as well using a CRD. The code is available here: elasticsearch.ts.

Assuming the application code has already been written, we now want to deploy it to a Kubernetes cluster.

Construct Catalog Search: Using YAML

Like we mentioned, we will first write pure k8s YAML.

To get my application code inside a container, I will try and embed it inside a ConfigMap,and later configure my pod to use that ConfigMap

kind: ConfigMap
metadata:
  name: query-config-map
apiVersion: apps/v1
data:
  query.js: // hmm...

As you can see, I’ve hit my first snag: how do I get my code from query.js to the manifest file?

Kubectl to the rescue

kubectl has native support for creating ConfigMap data from files:

❯ kubectl create configmap query-config-map --from-file=./query.js
configmap/query-config-map created

The next step is to create a Deployment that deploys our query server.

kind: Deployment
metadata:
  name: query-deployment
apiVersion: apps/v1
spec:
  replicas: 3
  template:
    spec:
      volumes:
        - name: query-app-volume
          configMap: 
            name: query-configmap
      containers:
        - name: 'query'
          image: 'node:12.18.0-stretch'
          ports:
            - containerPort: 8080
          command: ["node", "query.js"]
          env:
            - name: ELASTICSEARCH_ENDPOINT
              value: https://my.elasticsearch.cluster:9200/
          workingDir: /root
          volumeMounts:
            - mountPath: /root
              name: query-app-volume

So far, I defined a Deployment with a replica count of 3 and specified a pod template. My pods will include a ConfigMap based Volume, that will be mounted to /root.

Let’s apply it to the cluster and see what happens.

❯ kubectl apply -f manifest.yaml 
error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec;

Whoops, of course, I forgot to apply selectors so that the deployment will be able to find its pods.

This does beg the question: how do I make sure to keep the selectors in sync with the labels? Also, since the pod spec is defined in the scope of a deployment, it makes sense for the deployment to simply select all the pods it created.

Ok, lets add selectors and re-apply:

kind: Deployment
metadata:
  name: query-deployment
apiVersion: apps/v1
spec:
  replicas: 3
  selector: 
    matchLabels:
      app: query # instruct the deployment to select pods with this label
  template:
    metadata: 
      labels:
        app: query # apply a label to the pods
    spec:
      volumes:
        - name: query-app-volume
          configMap: 
            name: query-configmap
      containers:
        - name: 'query'
          image: 'node:12.18.0-stretch'
          ports:
            - containerPort: 8080
          command: ["node", "query.js"]
          env:
            - name: ELASTICSEARCH_ENDPOINT
              value: https://my.elasticsearch.cluster:9200/
          workingDir: /root
          volumeMounts:
            - mountPath: /root
              name: query-app-volume
❯ kubectl apply -f manifest.yaml                                                                                                                                                                                                       [14:37:33]
deployment.apps/query-deployment created

Looks okay, lets check out our pods:

❯ kubectl get -A pods
NAMESPACE   NAME                               READY   STATUS             RESTARTS   AGE
default     query-deployment-6576f6f795-4kttz  0/1     ContainerCreating  0          2m3s
default     query-deployment-6576f6f795-dqmwd  0/1     ContainerCreating  0          2m3s
default     query-deployment-6576f6f795-f2lqr  0/1     ContainerCreating  0          2m3s

We see 3 pods indeed, but for some reason they have been in ContainerCreating status for a long time. Let’s inspect one of them:

❯ kubectl describe pod query-deployment-6576f6f795-4kttz
....
....
Events:
  Type      Reason      Age                From                                    Message
  ----      ------      ----               ----                                    -------
  Normal    Scheduled   61s                default-scheduler                       Successfully assigned default/query-deployment-6576f6f795-4kttz to kind-control-plane
  Warning   FailedMount 29s (x7 over 61s)  kubelet, kind-control-plane             MountVolume.SetUp failed for volume "query-app-volume" : configmap "query-configmap" not found

Uh oh, did you spot the typo? We used query-configmap instead of query-config-map as the ConfigMap name.

This begs another question: since my ConfigMap is created out of band, how do I keep these values in sync?

Okay, let’s fix that and reapply:

kind: Deployment
metadata:
  name: query-deployment
apiVersion: apps/v1
spec:
  replicas: 3
  selector: 
    matchLabels:
      app: query
  template:
    metadata: 
      labels:
        app: query
    spec:
      volumes:
        - name: query-app-volume
          configMap: 
            name: query-config-map # this was our typo
      containers:
        - name: 'query'
          image: 'node:12.18.0-stretch'
          ports:
            - containerPort: 8080
          command: ["node", "query.js"]
          env:
            - name: ELASTICSEARCH_ENDPOINT
              value: https://my.elasticsearch.cluster:9200/
          workingDir: /root
          volumeMounts:
            - mountPath: /root
              name: query-app-volume 
❯ kubectl apply -f manifest.yaml && sleep 10 && kubectl get -A pods                                                                                                                                                                     [14:59:36]
deployment.apps/query-deployment configured
NAMESPACE            NAME                                         READY   STATUS    RESTARTS   AGE
default              query-deployment-6d95544db6-57rz2            1/1     Running   0          18s
default              query-deployment-6d95544db6-bz4qx            1/1     Running   0          15s
default              query-deployment-6d95544db6-qf6br            1/1     Running   0          16s

Cool, all seems to be in order!

The final step is to expose the pods as a service, so that they can be queried through a single network address. For the sake of simplicity, I’ll use a ClusterIP service, which is also the Kubernetes default.

kind: Service
apiVersion: v1
metadata:
  name: query-service
spec:
  type: ClusterIP
  ports:
    - port: 8000
      targetPort: 8080 # damn, another duplication
  selector:
    app: query # lets not forget the selector this time...
---
kind: Deployment
metadata:
  name: query-deployment
apiVersion: apps/v1
spec:
  replicas: 3
  selector: 
    matchLabels:
      app: query
  template:
    metadata: 
      labels:
        app: query
    spec:
      volumes:
        - name: query-app-volume
          configMap: 
            name: query-config-map
      containers:
        - name: 'query'
          image: 'node:12.18.0-stretch'
          command: ["node", "query.js"]
          env:
            - name: ELASTICSEARCH_ENDPOINT
              value: https://my.elasticsearch.cluster:9200/
          ports:
            - containerPort: 8080
          workingDir: /root
          volumeMounts:
            - mountPath: /root
              name: query-app-volume

One more question: How do I make sure the selector the service uses matches the label of the pods? Same thing for the target port, it has to be the same as the container port.

❯ kubectl apply -f manifest.yaml                                                                                                                                                                                             [15:05:16]
service/query-service created
deployment.apps/query-deployment unchanged

If I now port-forward 8000 on my machine, I should get a response from the pods.

Great, the query application is working. All that’s left to do is add the indexer. The indexer specification is basically the same as the query, except it is not exposed by a service. Eventually, we end up with this full manifest file:

kind: Deployment
metadata:
  name: query-deployment
apiVersion: apps/v1
spec:
  replicas: 3
  selector: 
    matchLabels:
      app: query
  template:
    metadata: 
      labels:
        app: query
    spec:
      volumes:
        - name: query-app-volume
          configMap: 
            name: query-config-map
      containers:
        - name: 'query'
          image: 'node:12.18.0-stretch'
          ports:
            - containerPort: 8080
          command: ["node", "query.js"]
          env:
            - name: ELASTICSEARCH_ENDPOINT
              value: https://my.elasticsearch.cluster:9200/
          workingDir: /root
          volumeMounts:
            - mountPath: /root
              name: query-app-volume
kind: Deployment
metadata:
  name: indexer-deployment
apiVersion: apps/v1
spec:
  replicas: 1
  selector: 
    matchLabels:
      app: indexer
  template:
    metadata: 
      labels:
        app: indexer
    spec:
      volumes:
        - name: indexer-app-volume
          configMap: 
            name: indexer-config-map
      containers:
        - name: 'indexer'
          image: 'node:12.18.0-stretch'
          command: ["node", "indexer.js"]
          env:
            - name: ELASTICSEARCH_ENDPOINT
              value: https://my.elasticsearch.cluster:9200/
            - name: QUEUE_URL
              value: https://sqs.us-east-1.amazonaws.com/111111111/my-queue
          workingDir: /root
          volumeMounts:
            - mountPath: /root
              name: indexer-app-volume
---
kind: Service
apiVersion: v1
metadata:
  name: query-service
spec:
  type: LoadBalancer
  ports:
    - port: 8000
      targetPort: 8080
  selector:
    app: query

Also, we have two auxiliary commands we need to run before applying this manifest:

❯ kubectl create configmap query-config-map --from-file=./query.js
❯ kubectl create configmap indexer-config-map --from-file=./indexer.js

Let’s recap, and specifically, focus on our guiding principles:

  • ❌  Desired State: Unfortunately, we were unable to define our application solely using a desired state YAML manifest. We had to resort to external imperative kubectl commands.
  • ❌  Don’t Repeat Yourself (DRY): We’ve seen several occurrences of having to duplicate and match values across multiple locations in the manifest.
  • ❌  Boilerplate: We have to explicitly apply selectors to the pod template and the deployment, even though the template is configured in the scope of the deployment, and can implicitly infer the selection labels. The same goes for pod spec volumes.
  • ❌  Cognitive Load: Even a simple application such as ours, required us to have rather extensive Kubernetes skills. We had to know what selectors are, how to create config maps with kubectl, and how to mount them as volumes onto the container.
  • ❌  Reusability: Since deploying our app involves running some custom kubectl commands, sharing it with others becomes tricky. We need to come up with a non-standard packaging and distribution mechanism. Also, our application cannot accept any configuration values, since we don’t have the ability to dynamically generate a manifest.

Construct Catalog Search: Using cdk8s

Next up, we explore the possibility of authoring a manifest file using a general purpose programming language. This is enabled by cdk8s, and truly opens the world of programming languages to infrastructure definitions. We already know exactly what we need to do, so let’s write down the entire application:

import * as k8s from '../../imports/k8s';
import * as cdk8s from 'cdk8s';
import * as fs from 'fs';

const app = new cdk8s.App();
const chart = new cdk8s.Chart(app, 'Search');

const indexerSelectionLabels = { app: 'indexer' };
const querySelectionLabels = { app: 'query' };
const image = 'node:12.18.0-stretch';
const queryVolumeName = 'query-app-volume';
const indexerVolumeName = 'indexer-app-volume';
const queryPort = 8080;

const indexerConfigMap = new k8s.ConfigMap(chart, 'IndexerConfigMap', {
  data: {
    'indexer.js': fs.readFileSync(`${__dirname}/indexer.js`, 'UTF-8'),
  },
})

const queryConfigMap = new k8s.ConfigMap(chart, 'QueryConfigMap', {
  data: {
    'query.js': fs.readFileSync(`${__dirname}/query.js`, 'UTF-8'),
  },
})

new k8s.Deployment(chart, 'IndexerDeployment', {
  spec: {
    selector: {
      matchLabels: indexerSelectionLabels,
    },
    template: {
      metadata: {
        labels: indexerSelectionLabels,
      },
      spec: {
        volumes: [{
          name: indexerVolumeName,
          configMap: {
            name: indexerConfigMap.name,
          },
        }],
        containers: [{
          name: 'indexer',
          image: image,
          command: [ 'node', 'indexer.js' ],
          env: [
            {
              name: 'ELASTICSEARCH_ENDPOINT',
              value: 'https://my.elasticsearch.cluster:9200/',
            },
            {
              name: 'QUEUE_URL',
              value: 'https://sqs.us-east-1.amazonaws.com/111111111/my-queue',
            },
          ],
          workingDir: '/root',
          volumeMounts: [{
            mountPath: '/root',
            name: indexerVolumeName,
          }],
        }],
      },
    },
  },
})

new k8s.Deployment(chart, 'QueryDeployment', {
  spec: {
    selector: {
      matchLabels: querySelectionLabels,
    },
    template: {
      metadata: {
        labels: querySelectionLabels,
      },
      spec: {
        volumes: [{
          name: queryVolumeName,
          configMap: {
            name: queryConfigMap.name,
          },
        }],
        containers: [{
          name: 'query',
          image: image,
          ports: [{
            containerPort: queryPort,
          }],
          command: [ 'node', 'query.js' ],
          env: [
            {
              name: 'ELASTICSEARCH_ENDPOINT',
              value: 'https://my.elasticsearch.cluster:9200/',
            },
          ],
          workingDir: '/root',
          volumeMounts: [{
            mountPath: '/root',
            name: queryVolumeName,
          }],
        }],
      },
    },
  },
})

new k8s.Service(chart, 'QueryService', {
  spec: {
    selector: querySelectionLabels,
    type: 'LoadBalancer',
    ports: [{
      port: 80,
      targetPort: queryPort,
    }],
  },
})

The API itself is basically a mirror of the YAML definition, but since we are now writing code, let’s see where we stand with our guiding principles:

  • ✅  Desired State: We no longer need any external kubectl commands. Getting our application code into the manifest is done by simply using fs.readFileSync.
  • ✅  Don’t Repeat Yourself (DRY): Any duplicate value is defined once, as a constant, and is reused when needed.
  • ❌  Boilerplate: Unfortunately, we still need to remember to apply selectors and configure pod spec volumes, even though this information can be inferred.
  • ❌  Cognitive Load: We haven’t solved this problem. We still require the same set of Kubernetes skills to write this application.
  • ✅  Reusability: We have two options here. 1) Publish a self-contained YAML manifest generated by running cdk8s synth. 2) Publish an NPM library that may or may not accept configuration values, and delegate the manifest generation to our users. Both ways are standard and simple.

In our next approach, I’ll show you how to address the two remaining principles through an approach called Intent-driven Design. By focusing on user intent, rather than on system mechanics, we can perform many operations on the user’s behalf, thus greatly reducing cognitive load and boilerplate definitions.

Construct Catalog Search: Using cdk8s+

Just like before, I’ll start by creating a ConfigMap that will contain our source code.

import * as kplus from 'cdk8s-plus';

// create a ConfigMap construct.
const queryConfigMap = new kplus.ConfigMap(this, 'QueryConfigMap');

A quick look at the API the kplus.ConfigMap construct offers, reveals the addFile() method. This conveys our intent of embedding a file in a ConfigMap. It essentially simulates the external kubectl command we used before.

Let’s use it:

queryConfigMap.addFile(`${__dirname}/query.js`);

I now need to create a Volume from that ConfigMap. So I use the Volume.fromConfigMap() function:

All I need to do now is create the container, and use its mount() method.

const queryContainer = new kplus.Container({ 
  image: 'node:12.18.0-stretch',  
  command: [ 'node', 'query.js' ],
  env: {
    ELASTIC_ENDPOINT: kplus.EnvValue.fromValue('https://my.elasticsearch.cluster:9200/'),
  },
});

queryContainer.mount('/root', kplus.Volume.fromConfigMap(queryConfigMap));

Next up, I’ll create a Deployment that will deploy 3 instances of this container. Just like before, we create a deployment:

const queryDeployment = new kplus.Deployment(this, 'QueryDeployment, {
  spec: {
    replicas: 3,
    podSpecTemplate: {
      containers: [queryContainer],
    },
  },
});

But this Deployment is a bit different from its cdk8s counterpart. It’s based on user intent, to understand what this means, let’s look at an excerpt from the manifest that cdk8s+ will synthesize for this deployment:

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      cdk8s.deployment: IndexerDeploymentC6A88652
  template:
    metadata:
      labels:
        cdk8s.deployment: IndexerDeploymentC6A88652

You can see that the cdk8s.deployment selection label was automatically added. This is the Deployment construct interpreting our intent, which is that we want this deployment to create and control pods defined by the template property.

We now want to expose these 3 pods (i.e the deployment) through a single network address. The Deployment construct offers an API to do just that:

queryDeployment.expose({port: 8000});

Again, notice what we didn’t have to do:

  • We didn’t have to specify any selectors.
  • We didn’t have to specify the container port.

Internally, this method will create a Serviceof type ClusterIP, and apply the correct selectors and ports. This is possible because the deployment already has all this information, and cdk8s+ implicitly uses it on my behalf. If I add the indexer deployment, the full cdk8s+ application definition looks like so:

import * as kplus from 'cdk8s-plus';
import * as cdk8s from 'cdk8s';

const app = new cdk8s.App();
const chart = new cdk8s.Chart(app, 'Search');

const image = 'node:12.18.0-stretch';
const elasticEndpoint = 'https://my.elasticsearch.cluster:9200/';

const queryConfigMap = new kplus.ConfigMap(this, 'QueryConfigMap');
queryConfigMap.addFile(`${__dirname}/query.js`);

const indexerConfigMap = new kplus.ConfigMap(this, 'IndexerConfigMap');
indexerConfigMap.addFile(`${__dirname}/indexer.js`);

const queryContainer = new kplus.Container({ 
  image: image,  
  command: [ 'node', 'query.js' ],
  env: {
    ELASTIC_ENDPOINT: kplus.EnvValue.fromValue(elasticEndpoint)
  }
});

const indexerContainer = new kplus.Container({ 
  image: image,  
  command: [ 'node', 'indexer.js' ],
  env: {
    ELASTIC_ENDPOINT: kplus.EnvValue.fromValue(elasticEndpoint),
    QUEUE_URL: kplus.EnvValue.fromValue('https://sqs.us-east-1.amazonaws.com/111111111/my-queue')
  }
});

queryContainer.mount('/root', kplus.Volume.fromConfigMap(queryConfigMap));
indexerContainer.mount('/root', kplus.Volume.fromConfigMap(indexerConfigMap));

const queryDeployment = new kplus.Deployment(chart, 'QueryDeployment, {
  spec: {
    replicas: 3,
    podSpecTemplate: {
      containers: [queryContainer]
    }
  }
});

queryDeployment.expose({port: 8000, type: kplus.ServiceType.NODE_PORT});

new kplus.Deployment(chart, 'IndexerDeployment, {
  spec: {
    replicas: 3,
    podSpecTemplate: {
      containers: [indexerContainer]
    }
  }
});

Going back to our guiding principles now, lets see where we’re at:

  • ✅  Desired State: Nothing has changed here, we still don’t need any external kubectl commands. And in-fact, we don’t even need to explicitly use readFileSync, since cdk8s+ will do that for us.
  • ✅  Don’t Repeat Yourself (DRY): Still good, our use of a programming language eliminates this issue.
  • ✅  Boilerplate: This code embodies the minimal amount of configuration needed to correctly deploy our application. All redundant information, such as selectors and pod spec volumes, is implicitly inferred.
  • ✅  Cognitive Load: We managed to greatly reduce the cognitive load since we were guided by intent based API’s. These API’s alleviate some of the skills needed to interact with Kubernetes resources.
  • ✅  Reusability: Same as before, we either publish an NPM package or a generated YAML manifest (or both).

Summary

We started with a multitude of issues that arise from the limitations of YAML. We saw many of those issues disappear when we used cdk8s to rewrite our YAML definition in a programming language. We also saw that simply using a programming language was not enough, as it still carried a rather high cognitive load on the developer. We then started using the intent based APIs provided by cdk8s+ and saw much of that load go away. Here is a recap of how well each approach addressed our guiding principles:

 

 

 

 

 

 

 

 

Head over to our GitHub repo to try cdk8s+. We want to hear about as many use cases as possible and develop the library alongside the community. We also invite you to join the discussion on our Slack channel and on Twitter (#cdk8s#cdk8s+).

Happy coding!