Containers

Category: Amazon Elastic Kubernetes Service

How HP achieved 40% improvement in Kubernetes node utilization on Amazon EKS using Karpenter

Introduction This post was co-authored by Jon Lewis (SW R&D Director in HP), Gajanan Chandgadkar (Principal Cloud Operations Architect, HP), Rutvij Dave (Sr. Solutions Architect at AWS), Ratnopam Chakrabarti (Sr. Solutions Architect, Containers and Open-Source technologies at AWS), Apeksha Chauhan(Senior Technical Account Manager at AWS) and Chance Lee (Sr. Container Specialist Solutions Architect at AWS) […]

How Vannevar Labs cut ML inference costs by 45% using Ray on Amazon EKS

This blog is authored by Colin Putney (ML Engineer at Vannevar Labs), Shivam Dubey (Specialist SA Containers at AWS), Apoorva Kulkarni (Sr.Specialist SA, Containers at AWS), and Rama Ponnuswami (Principal Container Specialist at AWS). Vannevar Labs is a defense tech startup, successfully cut machine learning (ML) inference costs by 45% using Ray and Karpenter on Amazon Elastic Kubernetes Service (Amazon EKS). […]

Monitoring and automating recovery from AZ impairments in Amazon EKS with Istio and ARC Zonal Shift

Introduction Running microservice-style architectures in the cloud can quickly become a complex operation. Teams must account for a growing number of moving pieces such as multiple instances of independent workloads, along with their infrastructure dependencies. These components can then be distributed across different topology domains, such as multiple Amazon Elastic Compute Cloud (Amazon EC2) instances, […]

Amazon EKS enhances Kubernetes control plane observability

Introduction Amazon Elastic Kubernetes Service (Amazon EKS) is a managed Kubernetes service to run Kubernetes in the AWS cloud and on-premises data centers. In the cloud, Amazon EKS automatically manages the availability and scalability of the Kubernetes control plane nodes responsible for scheduling containers, managing application availability, storing cluster data, and other key tasks. However, maintaining […]

Amazon EKS and Kubernetes sessions at AWS re:Invent 2024

Introduction AWS re:Invent 2024, the annual Amazon Web Services conference, is fast approaching. This year’s event will feature a full track of sessions focused on Kubernetes and other cloud-native technologies. To help you navigate the extensive session catalog, we’ve compiled a list of sessions around Kubernetes and cloud-native related topics. They have been grouped by […]

Amazon EKS now supports Amazon Application Recovery Controller

Introduction Amazon Elastic Kubernetes Service (Amazon EKS) now supports Amazon Application Recovery Controller (ARC). ARC is an AWS service that allows you to prepare for and recover from AWS Region or Availability Zone (AZ) impairments. ARC provides two sets of capabilities: Multi-AZ recovery, which includes zonal shift and zonal autoshift, and multi-Region recovery, which includes routing […]

Amazon EKS optimized Amazon Linux 2023 accelerated AMIs now available

Introduction Earlier this year we announced support for Amazon EKS optimized AL2023 AMIs that provided many enhancements in terms of security and performance. Amazon Linux 2023 (AL2023) is the next generation of Amazon Linux from Amazon Web Services (AWS) and is designed to provide a secure, stable, and high-performance environment to develop and run your […]

Scaling a Large Language Model with NVIDIA NIM on Amazon EKS with Karpenter

Many organizations are building artificial intelligence (AI) applications using Large Language Models (LLMs) to deliver new experiences to their customers, from content creation to customer service and data analysis. However, the substantial size and intensive computational requirements of these models may have challenges in configuring, deploying, and scaling them effectively on graphic processing units (GPUs). […]

Inside Pinterest’s Custom Spark Job logging and monitoring on Amazon EKS: Using AWS for Fluent Bit, Amazon S3, and ADOT

In Part 1, we explored Moka’s high-level design and logging infrastructure, showcasing how AWS for Fluent Bit, Amazon S3, and a robust logging framework make sure of operational visibility and facilitate issue resolution. For more details, read part 1 here. Introduction As we transition to the second part of our series, our focus shifts to […]

Inside Pinterest’s Custom Spark Job logging and monitoring on Amazon EKS: Using AWS for Fluent Bit, Amazon S3, and ADOT

This is Part 1 of the blog post. Introduction Pinterest is a visual search and curation platform focused on inspiring users to create a life they love. Critical to the service are data insights, recommendations and machine learning (ML) models that are produced by synthesizing insights provided by the over 500 million monthly active users […]