Containers

Tag: Kubernetes

Fault tolerant distributed machine learning training with the TorchElastic Controller for Kubernetes

Introduction Kubernetes enables machine learning teams to run training jobs distributed across fleets of powerful GPU instances like Amazon EC2 P3, reducing training time from days to hours. However, distributed training comes with limitations compared to the more traditional microservice based applications typically associated with Kubernetes. Distributed training jobs are not fault tolerant, and a […]

Optimizing Spark performance on Kubernetes

Apache Spark is an open source project that has achieved wide popularity in the analytical space. It is used by well-known big data and machine learning workloads such as streaming, processing wide array of datasets, and ETL, to name a few. Kubernetes is a popular open source container management system that provides basic mechanisms for […]

De-mystifying cluster networking for Amazon EKS worker nodes

Running Kubernetes on AWS requires an understanding of both AWS networking configuration and Kubernetes networking requirements. When you use the default Amazon Elastic Kubernetes Service (Amazon EKS) AWS CloudFormation templates to deploy your Amazon Virtual Private Cloud (Amazon VPC) and Amazon EC2 worker nodes, everything typically just works. But small issues in your configuration can result […]

Using EKS encryption provider support for defense-in-depth

Gyuho Lee, Rashmi Dwaraka, and Michael Hausenblas When we announced that we plan to natively support the AWS Encryption Provider in Amazon EKS, the feedback we got from you was pretty clear: can we have it yesterday? Now we’re launching EKS support for the encryption provider, a vital defense-in-depth security feature. That is, you can […]

Kubernetes Logging powered by AWS for Fluent Bit

September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details. Centralized logging is an instrumental component of running and managing Kubernetes clusters at scale. Developers need access to logs for debugging and monitoring applications, operations teams need access for monitoring applications, and security needs access for monitoring. These teams have […]

Securing EKS Ingress With Contour And Let’s Encrypt The GitOps Way

This is a guest post by Stefan Prodan of Weaveworks. In Kubernetes terminology, Ingress exposes HTTP(S) routes from outside the cluster to services running within the cluster. An Ingress can be configured to provide Kubernetes services with externally-reachable URLs while performing load balancing and SSL/TLS termination. Kubernetes comes with an Ingress resource and there are several controllers that […]

Using ALB Ingress Controller with Amazon EKS on Fargate

In December 2019, we announced the ability to use Amazon Elastic Kubernetes Service to run Kubernetes pods on AWS Fargate. Fargate eliminates the need for you to create or manage EC2 instances for your Kubernetes applications. When your pods start, Fargate automatically allocates compute resources on-demand to run them. Fargate is great for running and […]

EKS VPC routable IP address conservation patterns in a hybrid network

Introduction Our customers are embracing containers and Kubernetes/EKS for the flexibility and the agility it affords their developers. As environments continue to scale, they want to find ways to more efficiently utilize their private RFC1918 IP address space. This post will review patterns to help conserve your RFC1918 IP address space with your EKS pods leveraging […]

Autoscaling EKS on Fargate with custom metrics

This is a guest post by Stefan Prodan of Weaveworks. Autoscaling is an approach to automatically scale up or down workloads based on the resource usage. In Kubernetes, the Horizontal Pod Autoscaler (HPA) can scale pods based on observed CPU utilization and memory usage. Starting with Kubernetes 1.7, an aggregation layer was introduced that allows third-party […]