AWS Cloud Operations Blog

Category: Monitoring and observability

Create fine-grained CloudWatch canary schedules with cron expressions

Create fine-grained CloudWatch canary schedules with cron expressions

In this post, I’ll explain how to create fine-grained canary schedules to meet your business requirements using built-in cron expression scheduling in Amazon CloudWatch Synthetics. You can use CloudWatch Synthetics to create canaries, configurable scripts that run on a schedule, to monitor your endpoints and APIs. Because canaries follow the same routes and perform the […]

Implement operations observability in landing zone environments

Implement operations observability in landing zone environments

In an earlier blog post, Automate customized deployment of cross-account/cross-region CloudWatch dashboards using tags, we showed you how to implement Amazon CloudWatch dashboards for specific events with automation. This solution is great for seasonal events, holidays, important releases, and other use cases. In this blog post, we will review a landing zone environment and share a […]

Use Amazon EventBridge rules to run AWS Systems Manager automation in response to CloudWatch Alarms

Use Amazon EventBridge rules to run AWS Systems Manager automation in response to CloudWatch alarms

Since its launch in 2009, Amazon CloudWatch has become the cloud-native choice for a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. CloudWatch provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view […]

Improve monitoring of AWS Systems Manager Agent

Improve monitoring of AWS Systems Manager Agent

The ability to present a single pane of glass simplifies the process of tracking and controlling IT systems. Enterprises that run workloads on AWS use AWS Systems Manager because of its security, ease of management, and centralized reporting. When an agent loses connection to the management platform, you can lose visibility into system behavior and […]

Managing and monitoring API throttling in your workloads

Managing and monitoring API throttling in your workloads

When you’re architecting for the cloud, you need to keep API throttling in mind, particularly the types of calls and the frequency with which they are called. When the allotted rate limit for an API call is exceeded, you’ll receive an error response and the call will be throttled. Excessive API throttling can result in […]

Cost optimization in AWS using Amazon CloudWatch metric streams, AWS Cost and Usage Reports and Amazon Athena

Cost optimization in AWS using Amazon CloudWatch metric streams, AWS Cost and Usage Reports and Amazon Athena

You can use metric streams to create continuous, near-real-time streams of Amazon CloudWatch metrics to a destination of your choice. Metric streams make it easier to send CloudWatch metrics to popular third-party service providers using an Amazon Kinesis Data Firehose HTTP endpoint. You can create a continuous, scalable stream that includes the most up-to-date CloudWatch […]

Monitor network throughput of interface VPC endpoints using Amazon Cloudwatch

Monitor network throughput of interface VPC endpoints using Amazon CloudWatch

Security, cost and performance are always a top priority for AWS customers when they design their network. AWS PrivateLink is becoming increasingly popular because it provides secured private connectivity between Amazon Virtual Private Cloud (Amazon VPC), AWS services and your on-premises networks, without exposing your traffic to the public internet. In this blog post, we show you […]

Using Amazon CloudWatch with Amazon EventBridge for cross-account event monitoring

Using Amazon CloudWatch with Amazon EventBridge for cross-account event monitoring

We often talk about event driven architectures where an event is something that happens within your application or architecture. It could be a new file received by your application or when there is an alert triggered by high CPU utilization. We can act on these events by scanning the file contents or scaling out more […]

Automating the installation and configuration of Prometheus using Systems Manager documents

Automating the installation and configuration of Prometheus using Systems Manager documents

As organizations migrate workloads to the cloud, they want to ensure their teams spend more time on tasks that move the organization forward and less time managing infrastructure. Installing patches and configuring software is what AWS calls undifferentiated heavy lifting, or the hard IT work that doesn’t add value to the mission of the organization. […]