AWS Cloud Operations Blog

Category: Management & Governance

Announcing Amazon CloudWatch Container Insights for Amazon EKS Windows Workloads Monitoring

Monitoring containerized applications requires precision and efficiency. As your applications scale, collecting and summarizing application and infrastructure metrics from your applications can be challenging. One way to handle this challenge is using Amazon CloudWatch Container Insights which is a single-click native monitoring tool provided by AWS. Amazon CloudWatch Container Insights helps customers collect, aggregate, and summarize […]

Leverage AWS Resilience Lifecycle Framework to assess and improve the resilience of application using AWS Resilience Hub

As more customers advance in their cloud adoption journey, they recognize that simply migrating applications to the cloud does not automatically ensure resilience. To ensure resilience, applications need to be designed to withstand disruptions from infrastructure, dependent services, misconfiguration and intermittent network connectivity issues. While many organizations understand the importance of building resilient applications, some […]

Create AWS Config rules efficiently with Generative AI

AWS Config enables businesses to assess, audit, and evaluate the configurations of their AWS resources by leveraging AWS Config rules that represent your ideal configuration settings.  For example a Security Group that allows ingress on port 22 should be marked as noncompliant. AWS Config provides predefined rules called managed rules to help you quickly get […]

Modernizing Account Management with Amazon Bedrock and AWS Control Tower

Introduction The integration of Generative AI into cloud governance transforms AWS account management into a more automated and efficient process. Leveraging the generative AI capabilities of Amazon Bedrock alongside tools such as AWS Control Tower and Account Factory for Terraform (AFT), organizations can now expedite the AWS account setup and management process, aligning with best […]

Securely share AWS CloudTrail Lake logs across accounts without replicating data

In 2022, we launched AWS CloudTrail Lake, an immutable managed data lake designed to simplify audit, security, and compliance investigations by capturing, storing, and analyze AWS user and API activities. By providing immutable storage for your activity logs, CloudTrail Lake protects the integrity of your audit data by providing read-only access. CloudTrail Lake integrates seamlessly […]

Automating Alerts for AWS Global Network Performance

Have your applications hosted on AWS ever experienced inter-Region or inter-Availability Zone (AZ) latency and you wanted to be proactively notified on these latency changes? This blog post describes an automated mechanism to set up those alarms. AWS has introduced the ability to understand the performance of the AWS Global Network by introducing Infrastructure Performance, […]

How BMW Group uses automation to achieve end-to-end compliance at scale on AWS

This post is co-written with Dr. Jens Kohl, Daniel Engelhardt, and Sascha Kallin from BMW Group. The BMW Group – headquartered in Munich, Germany – is a vehicle manufacturer with 149,000 employees worldwide and manufactures in over 30 production and assembly facilities across 15 countries. Today, the BMW Group (BMW) is the world’s leading manufacturer […]

Real User Monitoring with Amazon CloudWatch RUM and Amazon Managed Grafana

Real User Monitoring with Amazon CloudWatch RUM and Amazon Managed Grafana

In today’s fast-paced digital world, users expect fast and reliable web experiences. Slow-loading pages, errors, and other performance issues can lead to lower engagement and conversion rates, ultimately hurting a business’s bottom line. That’s where Real User Monitoring (RUM) comes in. Real User Monitoring (RUM) is a crucial aspect of modern web application development, allowing developers and […]

Identifying resilience drift using AWS Resilience Hub

Most people think of disaster recovery as a mechanism to protect their applications against big events. However, in the fast-paced world of development where new code and infrastructure changes are occurring several times a month, it is important to put mechanisms in place to proactively understand impacts to the resilience posture of your applications. In […]

VTEX scales to 150 million metrics using Amazon Managed Service for Prometheus

VTEX scales to 150 million metrics using Amazon Managed Service for Prometheus

VTEX is a multi-tenant platform with a distributed engineering operation. Observing hundreds of services in real time in an efficient manner is a technical challenge for the business. In this blog, we will show how VTEX created a resilient open source-based architecture aligned with a sharding strategy, using Amazon Managed Service for Prometheus (AMP) to […]