AWS Cloud Operations Blog

Category: Management & Governance

Chaos engineering leveraging AWS Fault Injection Simulator in a multi-account AWS environment

Large-scale distributed software systems in the cloud are composed of several individual sub-systems—such as CDNs, load balancers, web servers, application servers and databases—as well as their interactions. The interactions sometimes have unpredictable outcomes caused by unforeseen events (for example, a network failure, instance failure, etc.). These events can lead to system-wide failures of your critical […]

How McAfee used Amazon CloudWatch to monitor a multi-PB data migration to Databricks on AWS

This blog post was contributed by Kanishk Mahajan@AWS; Hashem Raslan, Manager, Engineering@McAfee; Anastasia Zamyshlyaeva, Vice President, Data Engineering@McAfee McAfee, a global leader in online protection security enables home users and businesses to stay ahead of fileless attacks, viruses, malware, and other online threats. McAfee wanted to create a centralized data platform as a single source […]

Enforce best practices in AWS Systems Manager documents leveraging CFN Guard

Many of us use AWS Systems Manager (SSM) documents to help automate various tasks. As we author documents and move them toward deployment, we’ll likely enforce certain standards and best practices. The AWS CloudFormation team released a general-purpose tool called AWS CloudFormation Guard that we can use to help enforce these best practices. In this […]

Gaining more control over Multi-Regional AWS CloudFormation deployments

Routinely deploying resources to multiple regions is increasingly normal for situations like Disaster Recovery (DR), regulatory and compliance, and end-user latency requirements. Keeping multiple environments in sync is challenging and drives Infrastructure as Code (IaC) adoption through services like AWS CloudFormation. This post demonstrates a generic design pattern for orchestrating multi-Regional deployments when you need […]

Integrate Amazon CloudWatch metrics with ServiceNow using Amazon Managed Grafana

ServiceNow ITSM is a cloud-based platform designed to improve IT services, increase user satisfaction, and boost IT flexibility and agility. With ServiceNow IT Service Management, you can consolidate your legacy on-premise systems and IT tools into our single data model to transform the service experience, automate workflows, gain real-time visibility, and improve IT productivity. Amazon […]

Manage AWS resources in your Slack channels with AWS Chatbot

**This post was written while the feature to manage AWS resources in Slack channels was in public preview. This feature is now generally available. The information contained within this post is still relevant and helpful.** DevOps and engineering teams are increasingly moving their operations, system management, and CI/CD workflows to chat applications to streamline activities […]

Migrate On-Premises Multi-Tenant Systems to Amazon Elastic Kubernetes Service

Managing the deployment of containers in a multi-tenant environment presents a number of new challenges for many of my customers. Some organizations have explored building and managing their own Kubernetes container orchestration environment, but the management challenges lead them to evaluate Amazon Elastic Kubernetes Service (Amazon EKS). Particularly, Independent Software Vendors (ISVs) are using a […]

Monitoring Amazon EMR on EKS with Amazon Managed Prometheus and Amazon Managed Grafana

Apache Spark is an open-source lightning-fast cluster computing framework built for distributed data processing. With the combination of Cloud, Spark delivers high performance for both batch and real-time data processing at a petabyte scale. Spark on Kubernetes is supported from Spark 2.3 onwards, and it gained a lot of traction among enterprises for high performance and […]

Customize Well-Architected Reviews using Custom Lenses and the AWS Well-Architected Tool

The AWS Well-Architected Tool (AWS WA Tool) lets you learn best practices for architecting workloads on the cloud, measure workloads against these best practices, and improve the workload by implementing best practices. These best practices have been curated under the AWS Well-Architected Framework (AWS WA Framework) and Lenses based on our tens of thousands of […]

Codify your best practices using service control policies: Part 2

I introduced the fundamental concepts of service control policies (SCPs) in the previous post. We discussed what SCPs are, why you should create SCPs, the two approaches you can use to implement SCPs, and how to iterate and improve SCPs as your workload and business needs change. In this post, I will discuss how you […]