AWS Cloud Operations Blog
Category: Management Tools
Automating metrics collection on Amazon EKS with Amazon Managed Service for Prometheus managed scrapers
Managing and operating monitoring systems for containerized applications can be a significant operational burden for customers such as metrics collection. As container environments scale, customers have to split metric collection across multiple collectors, right-size the collectors to handle peak loads, and continuously manage, patch, secure, and operationalize these collectors. This overhead can detract from an […]
Ingesting administrative logs from Microsoft Azure to AWS CloudTrail Lake
In January 2023, AWS announced the support of ingestion for activity events from non-AWS sources using CloudTrail Lake. Making CloudTrail Lake a single location of immutable user and API activity events for auditing and security investigations. AWS CloudTrail Lake is a managed data lake for capturing, storing, accessing, and analyzing user and API activity on […]
Enable cloud operations workflows with generative AI using Agents for Amazon Bedrock and Amazon CloudWatch Logs
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible […]
Ten features for efficiently managing your AWS applications from Microsoft Teams and Slack using AWS Chatbot
Ten features in AWS Chatbot to help you understand your application health and resolve issues faster from chat channels.
Serverless Governance of Software Deployed with AWS Service Catalog
AWS Service Catalog (Service Catalog) is a powerful tool that empowers organizations to manage and govern approved services and resources. It significantly benefits platform engineering by standardizing environments, accelerating service delivery, and enhancing security. With its automated provisioning and resource management, Service Catalog supports infrastructure as code, enabling scalable, reliable deployments. Platform engineering teams are […]
Getting insights from Amazon Managed Service for Prometheus using natural language powered by Amazon Bedrock
As applications scale, customers need more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues. Organizations allocate money and developer time to deploy and manage various monitoring tools, while also dedicating considerable effort to training teams on their usage. When issues arise, operators navigate through […]
How Amazon CloudWatch Logs Data Protection can help detect and protect sensitive log data
Customer applications running on Amazon Web Services (AWS) often require handling sensitive data such as personally identifiable information (PII) or protected health information (PHI). As a result, sensitive log data can be intentionally or unintentionally logged as part of an application’s observability data. While comprehensive logging is important for application troubleshooting, monitoring and forensics, any […]
Visualize AWS Systems Manager Patch Manager information using Amazon QuickSight
In this blog post, learn how to build an Amazon QuickSight dashboard to visualize critical patch and inventory information to speed up MTTR. Also, you can use filters to search for a specific AWS Account, specific AWS Region, Amazon Elastic Compute Cloud (Amazon EC2) name, or check installed/missed packages. You want to visualize system patching […]
Leveraging AWS CloudTrail Insights for Proactive API Monitoring and Cost Optimization
AWS CloudTrail Insights is a powerful feature within AWS CloudTrail that helps organizations identify and respond to unusual operational activity in their AWS accounts. This includes identifying spikes in resource provisioning, bursts of IAM actions, or gaps in periodic maintenance activity. CloudTrail Insights continuously analyzes CloudTrail management events from trails and event data stores, establishing […]
Using Generative AI to Gain Insights into CloudWatch Logs
Have you ever been investigating a problem and opened up a log file and thought “I have no idea what I am looking at. If only I could get a summary of the data.” Observability and log data play an important role in maintaining operational excellence and ensuring the reliability of your applications and services. […]