AWS Cloud Operations Blog
Category: Monitoring and observability
Create fine-grained CloudWatch canary schedules with cron expressions
In this post, I’ll explain how to create fine-grained canary schedules to meet your business requirements using built-in cron expression scheduling in Amazon CloudWatch Synthetics. You can use CloudWatch Synthetics to create canaries, configurable scripts that run on a schedule, to monitor your endpoints and APIs. Because canaries follow the same routes and perform the […]
Implement operations observability in landing zone environments
In an earlier blog post, Automate customized deployment of cross-account/cross-region CloudWatch dashboards using tags, we showed you how to implement Amazon CloudWatch dashboards for specific events with automation. This solution is great for seasonal events, holidays, important releases, and other use cases. In this blog post, we will review a landing zone environment and share a […]
Use Amazon EventBridge rules to run AWS Systems Manager automation in response to CloudWatch alarms
Since its launch in 2009, Amazon CloudWatch has become the cloud-native choice for a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. CloudWatch provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view […]
Improve monitoring of AWS Systems Manager Agent
The ability to present a single pane of glass simplifies the process of tracking and controlling IT systems. Enterprises that run workloads on AWS use AWS Systems Manager because of its security, ease of management, and centralized reporting. When an agent loses connection to the management platform, you can lose visibility into system behavior and […]
Managing and monitoring API throttling in your workloads
When you’re architecting for the cloud, you need to keep API throttling in mind, particularly the types of calls and the frequency with which they are called. When the allotted rate limit for an API call is exceeded, you’ll receive an error response and the call will be throttled. Excessive API throttling can result in […]
Cost optimization in AWS using Amazon CloudWatch metric streams, AWS Cost and Usage Reports and Amazon Athena
You can use metric streams to create continuous, near-real-time streams of Amazon CloudWatch metrics to a destination of your choice. Metric streams make it easier to send CloudWatch metrics to popular third-party service providers using an Amazon Kinesis Data Firehose HTTP endpoint. You can create a continuous, scalable stream that includes the most up-to-date CloudWatch […]
Monitor network throughput of interface VPC endpoints using Amazon CloudWatch
Security, cost and performance are always a top priority for AWS customers when they design their network. AWS PrivateLink is becoming increasingly popular because it provides secured private connectivity between Amazon Virtual Private Cloud (Amazon VPC), AWS services and your on-premises networks, without exposing your traffic to the public internet. In this blog post, we show you […]
How The Washington Post’s Arc XP uses CloudWatch Metrics Explorer to reduce costs
In this post, it is described how The Washington Post’s Arc XP uses Metrics Explorer to monitor their global SaaS platform and reduce costs
Using Amazon CloudWatch with Amazon EventBridge for cross-account event monitoring
We often talk about event driven architectures where an event is something that happens within your application or architecture. It could be a new file received by your application or when there is an alert triggered by high CPU utilization. We can act on these events by scanning the file contents or scaling out more […]
Automating the installation and configuration of Prometheus using Systems Manager documents
As organizations migrate workloads to the cloud, they want to ensure their teams spend more time on tasks that move the organization forward and less time managing infrastructure. Installing patches and configuring software is what AWS calls undifferentiated heavy lifting, or the hard IT work that doesn’t add value to the mission of the organization. […]