AWS Cloud Operations Blog

Category: Management & Governance

Use AWS Systems Manager Automation runbooks to resolve operational tasks

Use AWS Systems Manager Automation runbooks to resolve operational tasks

OpsCenter provides a central location where operations engineers and IT professionals can view, investigate, and resolve operational work items (OpsItems) related to AWS resources. AWS Systems Manager Automation simplifies common maintenance and deployment tasks for Amazon Elastic Compute Cloud (Amazon EC2) instances and other AWS resources. You can use this capability to build automations to […]

Automating the installation and configuration of Prometheus using Systems Manager documents

Automating the installation and configuration of Prometheus using Systems Manager documents

As organizations migrate workloads to the cloud, they want to ensure their teams spend more time on tasks that move the organization forward and less time managing infrastructure. Installing patches and configuring software is what AWS calls undifferentiated heavy lifting, or the hard IT work that doesn’t add value to the mission of the organization. […]

Collecting Apache Flink metrics in the Amazon CloudWatch agent

Collecting Apache Flink metrics in the Amazon CloudWatch agent

Apache Flink is a distributed stream processing engine. You can run Flink on Amazon EMR as a YARN application. You can view Flink metrics through its web UI, but what if you want to react to them? In this blog post, I’ll show you how to use the CloudWatch agent to collect Flink metrics into […]

Monitoring hybrid environments using Amazon Managed Service for Grafana

Monitoring hybrid environments using Amazon Managed Grafana

Setting up observability for workloads is critical to tracking application performance, reliability, and health. It’s even more important when you’re dealing with workloads that are deployed in hybrid environments. A proliferation of monitoring tools can result in data silos or multiple single panes of glass. When an organization loses its consolidated view,  whether it be across […]

Monitor Amazon EventBridge events in your Slack channels with AWS Chatbot

DevOps teams use chat collaboration platforms such as Slack and Amazon Chime to monitor systems and respond to events. When AWS Chatbot is integrated with Slack and Chime, users can monitor and interact with AWS resources from the chat channels, which reduces context switching between applications. DevOps users now can receive notifications from more than […]

Featured Image for Blog Post with title 'Use AWS CloudWatch Contributor Insights to monitor AWS Foundations Benchmark Controls"

Use AWS CloudWatch Contributor Insights to monitor CIS AWS Foundations Benchmark controls

Contributor Insights is a feature of AWS CloudWatch that can be used to analyze log data to create time series that displays contributor data. This will help you understand who or what is impacting your system and application performance by identifying top talkers, pinpointing outliers, finding the heaviest traffic patterns, and ranking the top system […]

Using AWS Control Tower, AWS Service Catalog, and AWS Marketplace to deploy AWS Marketplace license subscriptions

Using AWS Control Tower, AWS Service Catalog, and AWS Marketplace to deploy AWS Marketplace license subscriptions

Enterprise customers with multiple AWS accounts want to subscribe once to an AWS Marketplace product and have all accounts in the organization deploy AWS Marketplace solutions without needing each account to subscribe first. AWS Control Tower helps customers create accounts and manage many account configurations and best practices. AWS Service Catalog helps customers deploy AWS […]

post featured image with title "Introducing CloudWatch Resource health to monitor your EC2 hosts"

Introducing CloudWatch Resource Health to monitor your EC2 hosts

Today, AWS announced Amazon CloudWatch Resource Health, a fully managed solution that customers can use to automatically discover, manage, and visualize the health and performance of Amazon Elastic Compute Cloud (Amazon EC2) hosts across their applications. Resource Health provides a centralized view of your EC2 hosts by performance dimensions such as CPU or memory utilization. […]

Behind the scenes as AWS AppConfig builds a Lambda extension

Behind the scenes as AWS AppConfig builds a Lambda extension

In this blog post, I will share why the AWS AppConfig team built an AWS Lambda extension (hint: customers wanted it), the effort required to build it (hint: it was easy), and the outcomes of building our Lambda extension (hint: lots). I will cover the technical and business aspects of building a Lambda extension and […]

Using VPC endpoints for AWS X-Ray

Today, AWS X-Ray announces the general availability of VPC endpoint support, which makes it possible for you to establish a private connection between your VPC and AWS X-Ray. Applications running in your VPC can now communicate with AWS X-Ray to send trace data without going through the public internet. In this post, I will show […]