AWS Cloud Operations Blog

Category: AWS Systems Manager

AWS AppConfig: The Amazon service that helps you scale for large events like Prime Day

AWS AppConfig: The Amazon service that helps you scale for large events like Prime Day

Amazon uses a number of AWS services to help meet increased traffic and demand during Prime Day events. As Jeff Barr has mentioned in his previous blog posts, some key services used in Prime Day include: Amazon DynamoDB handles the trillions of Prime Day requests. Amazon Interactive Video Service (Amazon IVS) enables shoppers to shop […]

Create a Jira issue using an AWS Config remediation action

Create a Jira issue using an AWS Config remediation action

AWS Config can create issue entries in the Jira Service Management platform when it determines an AWS resource is noncompliant. In this blog post, I show you how to configure an AWS Config rule to create a Jira issue after the rule detects a noncompliant AWS resource. I also share Jira Service Desk configuration changes […]

Aggregate operational tasks with AWS Systems Manager Explorer and OpsCenter

Aggregate operational tasks with AWS Systems Manager Explorer and OpsCenter

AWS Systems Manager Explorer is a customizable operations dashboard that reports information about your AWS resources. Explorer displays an aggregated view of operations data (OpsData) for your AWS accounts and across AWS Regions. Explorer provides context into how operational issues are distributed, trend over time, and vary by category. In this blog post, we explain […]

Remediate noncompliant AWS Config rules with AWS Systems Manager Automation runbooks

Remediate noncompliant AWS Config rules with AWS Systems Manager Automation runbooks

AWS Config is used to assess, audit, and evaluate the configuration of your AWS resources. You can use a set of AWS Config managed rules for common compliance scenarios or you can create your own rules for custom scenarios. In this blog post, I explain how AWS Systems Manager Explorer gathers the compliance status of […]

Automating the installation and configuration of Prometheus using Systems Manager documents

Automating the installation and configuration of Prometheus using Systems Manager documents

As organizations migrate workloads to the cloud, they want to ensure their teams spend more time on tasks that move the organization forward and less time managing infrastructure. Installing patches and configuring software is what AWS calls undifferentiated heavy lifting, or the hard IT work that doesn’t add value to the mission of the organization. […]

Collecting Apache Flink metrics in the Amazon CloudWatch agent

Collecting Apache Flink metrics in the Amazon CloudWatch agent

Apache Flink is a distributed stream processing engine. You can run Flink on Amazon EMR as a YARN application. You can view Flink metrics through its web UI, but what if you want to react to them? In this blog post, I’ll show you how to use the CloudWatch agent to collect Flink metrics into […]

Using AWS Control Tower, AWS Service Catalog, and AWS Marketplace to deploy AWS Marketplace license subscriptions

Using AWS Control Tower, AWS Service Catalog, and AWS Marketplace to deploy AWS Marketplace license subscriptions

Enterprise customers with multiple AWS accounts want to subscribe once to an AWS Marketplace product and have all accounts in the organization deploy AWS Marketplace solutions without needing each account to subscribe first. AWS Control Tower helps customers create accounts and manage many account configurations and best practices. AWS Service Catalog helps customers deploy AWS […]

automated operations cloud operating model

Reinventing automated operations (Part II)

The first post in this series, Reinventing automated operations (Part I), covered the importance of operations in the cloud and how deferring the creation of an operations plan can slow down your migration. In this post, I’ll share the primary mechanism of iterative improvement (aka flywheel) that AWS Managed Services (AMS) uses to increase operational […]

Detecting and remediating process issues on EC2 instances using Amazon CloudWatch and AWS Systems Manager

Detecting and remediating process issues on EC2 instances using Amazon CloudWatch and AWS Systems Manager

Customers want to have visibility into processes running inside their Amazon Elastic Compute Cloud (Amazon EC2) instances. Critical processes and services in these instances can crash unexpectedly and when they do, it’s crucial for customers to be notified so they can maintain continued business operations. There are multiple ways to see if a service is […]

Creating contacts, escalation plans, and response plans in AWS Systems Manager Incident Manager

Creating contacts, escalation plans, and response plans in AWS Systems Manager Incident Manager

Many of our customers need an effective incident management and response solution to achieve operational excellence and performance efficiency. Transparency between those who are affected by the incident and those who respond to the incident is key to any incident management process. Finding the right team to mitigate the impact of application or workload incidents […]