AWS Cloud Operations Blog
Category: Management Tools
The Mergers & Acquisitions Cloud Center of Excellence (M&A CCoE) – Part 1
The purpose of this two-part blog post series is to provide high-level guidance and principles to Mergers & Acquisitions (M&A) stakeholders on M&A Cloud Centers of Excellence (CCoEs). In this blog post, we introduce the concept of an M&A CCoE. In Part 2 of this series, we detail specific roles and responsibilities for M&A CCoEs. […]
Announcing AWS CloudTrail Lake one-year extendable retention pricing option
In 2022 Amazon Web Services (AWS) released AWS CloudTrail Lake, a managed audit and security lake that allows you to aggregate, immutably store, visualize, and query your activity logs for auditing, security investigation, and operational troubleshooting. Working backwards from our customers we have added capabilities to CloudTrail Lake such as the ability to copy CloudTrail events into […]
Monitoring MongoDB Atlas with AWS Managed Grafana and Amazon Managed Service for Prometheus
Many customers use MongoDB Atlas to store data from their modern business-critical applications. MongoDB Atlas provides highly a scalable, secure, highly-available and fully managed data platform. Operational monitoring of MongoDB Atlas clusters has a number of benefits. It helps prevent application downtime and customer disruptions, ensuring healthy functioning of MongoDB Atlas clusters. MongoDB Atlas supports […]
Extend your Amazon Managed Grafana experience with Grafana community plugins
Today, Amazon Managed Grafana announces a new self-service plugin management experience for Grafana community plugins, that enables you to unify data from a wider variety of data sources with visualizations tailored to analyze your unique datasets. Grafana community plugins provide an expansive array of tailor-made solutions to address diverse visualization use cases. With this release, […]
Use AWS Config inventory and compliance dashboards for a unified view of resource inventory and compliance
We recently announced AWS Config compliance and inventory dashboards, a new AWS Config feature, that provides unified dashboards for AWS resource configurations and compliance across AWS accounts, AWS regions, or an AWS Organization. In this blog post, I will walk you through the dashboards and widgets that are included as of today for this launch. […]
Analyzing Amazon Lex conversation log data with Amazon Managed Grafana
To support business and internal processes, organizations are increasing their use of conversational interfaces. They offer opportunities for more availability, improved service levels, and reduced costs. As these conversational services become more important, so, does the need to monitor performance and effectiveness of these interfaces with analytics and dashboards. This analysis is used to drive […]
Know Before You Go – AWS re:Invent 2023 | AWS Management Console
New this year, the AWS Customer Experience team has tips to help you enhance your re:Invent experience and learn about various improvements that make AWS even easier to use. Meet us at our kiosks in the AWS Village and be sure to check out the sessions below. Our sessions will cover best practices for managing […]
Build a Cloud Automation Practice for Operational Excellence: Best Practices from AWS Managed Services
Introduction In today’s fast-paced business environment, organizations are actively pursuing operational excellence to maintain a competitive edge. Automation is a critical foundation for achieving better efficiency, reliability, and scalability in operations. However, integrating automation into cloud practice entails more than simply implementing software or tools. Building a cloud automation practice requires a transformative journey that […]
Creating a correction of errors document
This blog post will walk you through an example of creating a Correction of Errors (COE) document. At Amazon, operational excellence is in our DNA. One best practice that we have learned at Amazon is to have a standard mechanism for post-incident analysis. The COE process facilitates learning from an event to avoid reoccurrences in […]
Monitoring GPU workloads on Amazon EKS using AWS managed open-source services
As machine learning (ML) workloads continue to grow in popularity, many customers are looking to run them on Kubernetes with graphics processing unit (GPU) support. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training and cost-effective ML inference. Monitoring GPU utilization gives valuable information for researchers working […]