AWS Cloud Operations Blog
Monitor the health of AWS Systems Manager agent using Amazon CloudWatch
AWS customers use AWS Systems Manager to view and control their infrastructure on AWS. Using the AWS Systems Manager console, they can view operational data from multiple AWS services and automate operational tasks across AWS resources. AWS Systems Manager helps you maintain security and compliance by scanning your managed instances. It also reports on (or takes corrective action on) policy violations it detects.
To manage an instance in AWS Systems Manager, the AWS Systems Manager agent (SSM agent) must be installed on the instance. SSM agent is Amazon software that can be installed and configured on an EC2 instance, an on-premises server, or a virtual machine (VM). SSM agent makes it possible for AWS Systems Manager to update, manage, and configure these resources. For more information, see About SSM agent in the AWS Systems Manager User Guide.
The latest release of SSM agent, version 3.0, logs start and stop events for both agent and worker processes. This log is sent to Amazon CloudWatch.
In this blog post, I demonstrate how these start and stop events can be made actionable using Amazon CloudWatch alarms to monitor the health of the SSM agent running on the instance.
Overview
First, I’ll set up AWS Systems Manager on an EC2 instance. Then I update the SSM agent on the instance to version 3.0. I set up an Amazon SNS notification through an Amazon CloudWatch alarm so I am notified when SSM agent is started. Lastly, I test the results.
Set up AWS Systems Manager
If you have already configured Systems Manager, ignore this step. Otherwise, follow the steps in Setting up AWS Systems Manager in the AWS Systems Manager User Guide.
Update to SSM agent version 3.0
If the Amazon Machine Image (AMI) you choose doesn’t have SSM agent installed, you must install it manually.
If you configured your managed instances to automatically update SSM agent by using target version ($LATEST), the default configuration for auto-update), Systems Manager automatically updates SSM agent on your instances to version 3.0 and removes version 2.x. If you manually download SSM agent, the system installs version 2.x, and then upgrades it to version 3.0. We recommend that you automate the process of updating SSM agent by turning on auto-upgrade. For more information, see Automating updates to SSM agent in the AWS Systems Manager User Guide.
You can check the SSM agent version on the managed instances page in the AWS Systems Manager console. In the following example, I have installed the SSM agent on EC2 Windows instance named WindowsSSMAgentTest. For more information, see Checking the SSM agent version number in the AWS Systems Manager User Guide.
Figure 1: List of managed instances in the AWS Systems Manager console
Set up CloudWatch log filter and alarm
If you have SSM agent version 3.0 installed, it tracks its start and update events in the logs. For more information, see Sending SSM agent logs to CloudWatch Logs. When SSM agent starts and stop, you can find the following log entries in your stream, in the following format.
When agent stops:
INFO [Stop @ agent.go.98] [ssm-agent-worker] Stopping ssm agent worker
When agent starts:
INFO [Start @ agent.go.73] [ssm-agent-worker] Starting SSM Agent Worker: amazon-ssm-agent - v3.0.0.0
You can track these start and stop messages to see how often SSM agent restarts in your instances.
Figure 2: Define pattern page of the CloudWatch console
I used the Amazon CloudWatch console to test potential filter patterns against actual log data. As you can see in Figure 3, I then then named the filter and mapped it to a CloudWatch namespace and metric.
Figure 3: Assign metric page of the CloudWatch console
Test
I restarted the SSM Agent and then analyzed the CloudWatch information looking for SSM Agent start and stop events.
Figure 4: MyStopAlert metric selected in the CloudWatch console
You can also use a CloudWatch Logs Insight query for further analysis:
filter @message like /Stopping ssm agent worker/
Figure 5: CloudWatch Logs Insight query showing timestamp and message
Conclusion
In this blog post, I showed you how SSM Agent version 3.0 logs agent and worker start and stop events to Amazon CloudWatch. These logs can be used to monitor the health of the agent and trigger notifications using CloudWatch alarms.
For more information, see SSM Agent version 3 in the AWS Systems Manager User Guide.