AWS Cloud Operations Blog
How Moody’s uses AWS Systems Manager to patch servers across multiple cloud providers
Introduction
Enterprises today continue to face challenges maintaining an inventory of all of their infrastructure. They need to ensure timely patching of their servers spread across their on-premises and cloud environments using the same set of tools.
In this guest blog post, Divya Elaty, VP, Cloud Engineering at Moody’s, and Sarat Guttikonda, Global Solutions Architect at AWS, explain how they use AWS Systems Manager as the preferred tool for patching their hybrid environments, which use AWS and Azure.
Problem statement
With our experience at Moody’s patching hundreds and thousands of servers in our data centers, spread across multiple geographical locations, we understand the challenges of maintaining an inventory of all servers, operating systems, and their patch levels. Servers owned and managed by different teams have different patching strategies, schedules, and severities defined. The biggest challenge is to find a single tool or a mechanism to check if all the servers are compliant with a particular defined patch level for a specific operating system, and patch them in time. This challenge becomes even more complicated when we need to expand to multiple cloud platforms.
After looking at and exploring a few available products, we have decided to try AWS Systems Manager to gather inventory, define patch baselines, and patch servers across multiple cloud providers from a centralized AWS Systems Manager console. We’re also using Systems Manager to review if all the servers are compliant with the baselines we have defined, and act on those that are non-compliant.
AWS Systems Manager is a service that makes it easier to configure and manage your Amazon EC2 instances, on-premises servers and virtual machines, and other AWS resources at scale. Systems Manager gives you a complete view of your infrastructure performance and configuration, simplifies resource and application management, and makes it easy to operate and manage your infrastructure at scale.
In this blog post we’ll walk you through the architecture, workflow, and step-by-step approach of how we at Moody’s configured AWS Systems Manager as the desired tool for patching servers across a hybrid environment.
Architecture
AWS implementation:
- Workflows:
- Scan – This workflow triggered by a scheduled Amazon CloudWatch event, which invokes a Systems Manager Automation Document that runs a patch baseline (AWS-RunPatchBaseline) with a “Scan” operation to find EC2 instances that are NON_COMPLIANT.
- Run frequency – Once a week.
- Check Systems Manager role – This workflow triggered by a Scheduled Amazon CloudWatch Event invokes an AWS Lambda function to check if all the EC2 instances have the required Automation role assigned to them.
- If an instance does not have the Systems Manager role attached, the Lambda function will attach the required role.
- If an instance has a role attached, but, doesn’t have Systems Manager permissions, a policy with Systems Manager permissions is attached to the role.
- Run frequency – EC2 Event based actions.
- Patch – This workflow triggered by a Scheduled CloudWatch Event, which invokes an Automation Document that runs a patch baseline (AWS-RunPatchBaseline), which will do the following:
- Check if there are any stopped instances with a specific tag.
- If there are stopped instances, start them, and update the AWS Systems Manager Agent on them.
- Patch all instances.
- Stop only the instances that were down before the workflow started.
- Run frequency – Once every two weeks or at a time that depends on your patch cycle.
- Check if there are any stopped instances with a specific tag.
- Scan – This workflow triggered by a scheduled Amazon CloudWatch event, which invokes a Systems Manager Automation Document that runs a patch baseline (AWS-RunPatchBaseline) with a “Scan” operation to find EC2 instances that are NON_COMPLIANT.
- In Systems Manager, Amazon EC2 instances with a specific tag are grouped using Resource Groups. These groups can be Linux Groups (Patch Group A), Windows Groups (Patch Group B), or AppA Groups (Patch Group C).
- Linux Patch Group – EC2 Linux instances grouped to be patched by a defined Linux patch baseline.
- Windows Patch Group – EC2 Windows instances grouped to be patched by a defined Windows patch baseline.
- AppA Patch Group – EC2 instances related to an application AppA grouped to be patched by a defined patch baseline.
- AWS Systems Manager Inventory is used to collect operating system (OS), application, and instance metadata from the Amazon EC2 instances.
- AWS Config rules check whether the compliance status of the AWS Systems Manager association compliance is COMPLIANT or NON_COMPLIANT after the association execution on the instance.
- Notifications are sent via Amazon Simple Notification Service (Amazon SNS) if there are any NON_COMPLIANT instances.
- Remediations can be applied based on the NON_COMPLIANT findings.
- Logs from Systems Manager Inventory and Patch processes are stored in Amazon S3.
- Amazon QuickSight is used to create and publish interactive dashboards that can be accessed from browsers or mobile devices.
Azure implementation:
With Azure virtual machines (VMs), everything is same as with AWS, except that for the second workflow there will be “Check Azure VM Tags” instead of “Check Systems Manager Role.”
- Workflow:
- Check Azure VM Tags – This workflow, which is triggered by a Scheduled CloudWatch event, invokes an Azure Automation to check if all the VMs have the required tags.
- If a VM doesn’t have the tag, the Automation will add the tag.
- This workflow also exports Azure VMs tags.
- Run frequency – Once a day.
- Check Azure VM Tags – This workflow, which is triggered by a Scheduled CloudWatch event, invokes an Azure Automation to check if all the VMs have the required tags.
How to use Systems Manager for patching servers using an on-premises, multi-cloud, multi-account, and multi-Region infrastructure setup
What’s used
- 3 AWS Accounts
- A Security Account providing centralized access to all servers
- A Shared Services Account hosting few EC2 instances
- An Application Account hosting few EC2 instances
- 2 AWS Regions in the AWS accounts
- 1 Microsoft Azure Account hosting few VMs
In this section, we’ll walk you through the following:
- Setting up the AWS Systems Manager Agent on the servers
- Gathering inventory of all servers in AWS Systems Manager
- Patching servers from a centralized AWS Management Console, using AWS default Patch Baselines
- Checking compliance of all servers against pre-defined patch base lines
1. Setting up AWS Systems Manager
Prerequisites
- Before you start to use Systems Manager, we recommend reviewing the prerequisites.
- Ensure that the servers you are going to use run the supported operating systems.
Amazon S3 setup
Create two S3 buckets in every Region and in every managed account to store logs from patch-baseline and inventory actions.
Account1-Region1:
arn:aws:s3:::ss-oregon-patch-baseline-snapshot
arn:aws:s3:::ss-oregon-ec2-inventory
Account1-Region2:
arn:aws:s3:::ss-london-patch-baseline-snapshot
arn:aws:s3:::ss-london-ec2-inventory
Account2-Region1:
arn:aws:s3:::tc-oregon-patch-baseline-snapshot
arn:aws:s3:::tc-oregon-ec2-inventory
Account2-Region2:
arn:aws:s3:::tc-london-patch-baseline-snapshot
arn:aws:s3:::tc-london-ec2-inventory
Create an S3 bucket in the master account in a single Region of your choice to store resource data sync information.
Account3-Region1:
arn:aws:s3:::sec-virginia-ssm-data-sync
Network setup
Assumptions:
- Virtual Private Clouds (VPCs) in two Regions (for example, Oregon (us-west-2) and London (eu-west-2)) are created in two accounts.
- Each VPC has two public and two private subnets.
Security group for VPC endpoints:
- In the AWS Management Console, navigate to the VPC Dashboard, and create a security group for VPC endpoints.
- Add an inbound rule in the security group created for the VPC endpoints to allow HTTPS (port 443) from the security group of private instances.
Security group for EC2 instances in the private subnets:
- Navigate to the VPC Dashboard, and create a security group. Or you can use an existing security group available for your EC2 instances in the private subnets.
- No additional inbound rules are required in the security group created for the EC2 instances in the private subnets.
Configuring access to Systems Manager
- Step 1: Configure one or more users who would be using Systems Manager.
- Step 2: Create an Instance Profile that would allow Systems Manager to perform actions on your instances.
- Role name: AWS-SSM-Role (this can be any name you like)
- Step 3: Launch EC2 instances that use the instance profile you created in the previous step.
- Create the EC2 instances with the following parameters:
- IAM Role – Choose the IAM role (AWS-SSM-Role), with the AmazonEC2RoleforSSM Managed Policy attached.
- Subnets – Choose the private subnets.
- Security Group – Choose the security group created for EC2 instances in the private subnets.
- AWS Systems Manager Agent – AWS Systems Manager Agent is preinstalled on Amazon EC2 Amazon Machine Images (AMIs)
- Tags – Create tags to group instances in Systems Manager.
- Example tag names are “Patch Group” and Value: “SSM-Private”
- Create the EC2 instances with the following parameters:
- Step 4: You can improve the security posture of your managed instances (including managed instances in your hybrid environment) by configuring Systems Manager to use an interface VPC endpoint. Interface endpoints are powered by AWS PrivateLink, a technology that enables you to privately access Amazon EC2 and Systems Manager APIs by using private IP addresses. PrivateLink restricts all network traffic between your managed instances, Systems Manager, and Amazon EC2 to the Amazon network. (Managed instances don’t have access to the internet.) Also, you don’t need an internet gateway, a NAT device, or a virtual private gateway.
For more information, see the documentation.
Please read the restrictions and limitations of using VPC Endpoints.
Setting Up Systems Manager in hybrid environments
Systems Manager lets you remotely and securely manage on-premises servers and virtual machines (VMs) in your hybrid environment. To configure your hybrid environment for Systems Manager, see the documentation.
After you finish, your hybrid machines that are configured for Systems Manager are listed in the AWS Systems Manager console and described as managed instances. Amazon EC2 instances configured for Systems Manager are also managed instances.
In the AWS Systems Manager console, hybrid instances will have a prefix “mi-“, while the Amazon EC2 instances will have a prefix “i-”.
2. Gathering inventory
AWS Systems Manager Inventory
AWS Systems Manager Inventory provides visibility into your Amazon EC2 and on-premises computing environment. You can use Inventory to collect metadata from your managed instances. You can store this metadata in a central Amazon Simple Storage Service (Amazon S3) bucket, and then use built-in tools to query the data and quickly determine which instances are running the software and configurations required by your software policy, and which instances need to be updated. You can configure Inventory on all of your managed instances by using a one-click procedure. You can also configure and view inventory data from multiple AWS Regions and accounts.
Configuring Resource Data Sync for Inventory
You can use Systems Manager Resource Data Sync to send Inventory data collected from all of your managed instances to a single Amazon S3 bucket.
If you have not configured Resource Data Sync for Inventory, you either need to manually gather the collected inventory data for each instance, or you have to create scripts to gather this information. You would then need to port the data into an application so that you can run queries and analyze it.
With Resource Data Sync, you perform a one-time operation that synchronizes all Inventory data from all of your managed instances. After the sync is successfully created, Systems Manager creates a baseline of all Inventory data and saves it in the target Amazon S3 bucket. When new inventory data is collected, Systems Manager automatically updates the data in the Amazon S3 bucket. You can then quickly and cost-effectively port the data to Amazon Athena and Amazon QuickSight.
Set up Inventory and Resource Data Sync for all the managed accounts within the desired Regions.
Verifying the setup
Perform the following verification in your master account (and Region) as the inventory data should be collected in the centralized S3 bucket in the master account.
Log in to the master account, and go to the Amazon S3 console. Navigate through the centralized bucket that you created (sec-virginia-ssm-data-sync).
Instance metadata will be captured and sorted in folder structures by Parameter, AccountID, and the Region. You can drill down to see the information about each instance.
3. Patching with an Automation document Invoked by Amazon CloudWatch
- Step 1: The Automation Document runs a describe instances API call to get the list of stopped instances based on the tags passed as inputs, and saves the instance IDs that are to be stopped at the end of workflow.
- Checks if there are any stopped instances.
- Step2: If there are stopped instance, the startInstances API action starts them.
- Step 3: Updates the AWS Systems Manager Agent on the instances from Step 2.
- Step 4: Patches the instances from Step 2, and the Running instances using the RunPatchBaseLine Document.
- Step 5: Stops the instances that were started in Step 2.
4. Checking Compliance
You can use AWS Systems Manager Configuration Compliance to scan your fleet of managed instances for patch compliance and configuration inconsistencies. You can collect and aggregate data from multiple AWS accounts and Regions, and then drill down into specific resources that aren’t compliant. Systems Manager Compliance offers the following additional benefits and features:
- View compliance history and change tracking for Patch Manager patching data and State Manager associations by using AWS Config.
- Customize Systems Manager Compliance to create your own compliance types based on your IT or business requirements.
- Remediate issues by using Systems Manager Run Command, State Manager, or Amazon CloudWatch Events.
- Port data to Amazon Athena and Amazon QuickSight to generate fleet-wide reports.
Working with Systems Manager Inventory Data
AWS Systems Manager Inventory helps you query inventory data from multiple AWS Regions and accounts.
Querying Inventory Data from Multiple Regions and accounts:
The following steps need to be performed in the master account (and Region). Read this documentation.
Querying Inventory Data from multiple Regions and accounts using Amazon Athena
- Log in to the Amazon Athena console.
- Chose the database.
- Run a query in the Query Editor as shown in the following screenshot.
Querying Inventory Data from Multiple Regions and accounts Using Amazon QuickSight
- Log in to the Amazon QuickSight console.
- Choose Manage data.
- Choose New data set.
- Under Create a Data Set, chose Athena.
- Enter the Amazon Athena Data source name.
- Choose Validate connection.
- Choose Create data source.
- Choose table from which you want to develop dashboards.
- Choose the appropriate fields for different types of visualizations.
Remediating compliance issues
If you identify any instances that are in NON_COMPLIANT status, you can quickly remediate patch and association compliance issues by using Systems Manager Run Command. You can target either instance IDs or Amazon EC2 tags and run the AWS-RunPatchBaseline document or the AWS-RefreshAssociation document.
Conclusion
In this blog post, we have seen how AWS Systems Manager can be used as a tool to manage your inventory of servers, patch them, and check them for compliance from a centralized web console. The same process discussed for Azure can be extended to your on-premises servers.
AWS Systems Manager integrates well with other services, and provides the following features to help you manage virtual machines from a single location:
- Centralized access control for your servers and VMs by using AWS Identity and Access Management (IAM).
- Centralized auditing on actions performed on your servers and VMs using AWS CloudTrail.
- Centralized and secure remote management of your on-premises workloads using your existing scripts.
- Centralized monitoring using CloudWatch Events and Amazon SNS to send notifications on service executions.
About the Authors
Sarat Guttikonda is a Global Solutions Architect at Amazon Web Services. He is a serverless enthusiast, and helps Financial Services customers deploy secure, resilient, and scalable applications on AWS.
Divya Elaty is a VP of Cloud Engineering at Moody’s Corporation. She is a cloud enthusiast and works in the areas of cloud architecture, cloud security, cloud automation and orchestration.