AWS Cloud Operations Blog
Migrating to Amazon Managed Service for Prometheus with the Prometheus Operator
The Prometheus Operator allows cluster administrators to manage Prometheus clusters running in Kubernetes. It makes it easy to deploy and manage Prometheus via native Kubernetes components. In this blog post, I will demonstrate how you can deploy Prometheus via the Prometheus Operator, and how you can easily migrate your monitoring workloads to take advantage of using Amazon Managed Service for Prometheus. You can continue use the toolset you’re familiar with to manage your workload while offloading the burden of managing your observability stack.
Amazon Managed Service for Prometheus is a serverless, Prometheus-compatible monitoring service for container metrics that makes it easier to securely monitor container environments at scale. With Amazon Managed Service for Prometheus, you can use the same open-source Prometheus data model and query language that you use today to monitor the performance of your containerized workloads, and also enjoy improved scalability, availability, and security without having to manage the underlying infrastructure.
Prerequisites
For this blog post you will need following components:
- An Amazon Elastic Kubernetes Service (Amazon EKS) cluster (version 1.23 or above )
- kubectl, the Kubernetes command-line tool
- eksctl command line tool for creating and managing Kubernetes clusters on Amazon EKS
- helm package manager for Kubernetes
- awscurl to query Prometheus-compatible APIs
- Permission to create an Amazon Identity and Access Management (IAM) roles
- The Amazon Elastic Block Store (Amazon EBS) Container Storage Interface (CSI) driver, which allows Amazon EKS clusters to manage the lifecycle of Amazon EBS volumes for persistent volumes
For this example, I’ve set up an Amazon EKS cluster and updated kubeconfig to call kubectl
on my cluster. This cluster will connect to AWS resources (like Amazon Managed Service for Prometheus), so I’ve created an IAM OIDC provider for the cluster so that the cluster can use AWS IAM roles for service accounts.
1. Installing the Prometheus Operator
The Prometheus Operator works by way of Custom Resource Definitions (CRDs). These CRDs extends the Kubernetes API to create and manage applications running in Kubernetes. When you invoke the Prometheus Operator, it examines the request and adjusts the Kubernetes cluster to match the desired state. The Prometheus Operator includes CRDs for Prometheus, Alertmanager, and a number of other Prometheus-related resources.
For this example I’m using the Getting started guide to install the Prometheus Operator.
After following the guide, I have a basic workload that includes Prometheus scraping a basic instrumented application. The following configuration shows the CRD I am using for Prometheus:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
podMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
enableAdminAPI: false
With Prometheus running, I can view the Prometheus UI by running
kubectl port-forward svc/prometheus-operated 9090:9090
The Prometheus web UI is visible in a browser at localhost:9090. After a few minutes, I can see that metrics are being gathered. Additionally, I can create Alertmanager instances and set up monitoring rules to begin monitoring the workload. This getting started guide for alerting walks through how to configure the CRDs for Alertmanager.
2. Updating the workload to use Amazon Managed Service for Prometheus
First, set up an Amazon Managed Service for Prometheus workspace. In this post, I use the AWS Controllers for Kubernetes (ACK) for Amazon Managed Service for Prometheus. The ACK controller lets you create native AWS objects using custom resource definitions (CRDs) within the Kubernetes environment. You can also set up a workspace manually using the AWS Command Line Interface (CLI) or the console.
I use the following commands to install the ACK controller for Amazon Managed Service for Prometheus to the Amazon EKS cluster, where REGION
is the region of the workload.
export SERVICE=prometheusservice
export RELEASE_VERSION=$(curl -sL https://api.github.com/repos/aws-controllers-k8s/${SERVICE}-controller/releases/latest | jq -r '.tag_name | ltrimstr("v")')
export ACK_SYSTEM_NAMESPACE=ack-system
export AWS_REGION=REGION
aws ecr-public get-login-password --region us-east-1 | helm registry login --username AWS --password-stdin public.ecr.aws
helm install --create-namespace -n $ACK_SYSTEM_NAMESPACE ack-$SERVICE-controller \
oci://public.ecr.aws/aws-controllers-k8s/$SERVICE-chart --version=$RELEASE_VERSION --set=aws.region=$AWS_REGION
After configuring my environment to use the ACK controller, I create a new Amazon Managed Service for Prometheus workspace using following configuration.
apiVersion: prometheusservice.services.k8s.aws/v1alpha1
kind: Workspace
metadata:
name: prometheus-workspace
spec:
alias: prometheus-workspace
tags:
ClusterName: prom-operator-demo
I create the workspace via kubectl
. I then run the following kubectl
command to retrieve the Workspace ID
, which will be used later
kubectl describe workspace prometheus-workspace
I set up a service role to ingest the metrics from my Amazon EKS cluster into the workspace. The IAM role has aps:RemoteWrite
, aps:GetSeries
, aps:GetLabels
, and aps:getMetricMetadata
permission on the workspace. The role must have an appropriate trust relationship so the EKS cluster can assume the role. In my case, the role is named amp-iamproxy-ingest-role
.
3. Configuring the workspace remote write endpoint
To use Amazon Managed Service for Prometheus via the Prometheus Operator, I update the Prometheus CRD by adding a remoteWrite
configuration as follows:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::ACCOUNT-ID:role/amp-iamproxy-ingest-role"
spec:
serviceAccountName: prometheus
serviceMonitorSelector:
matchLabels:
team: frontend
resources:
requests:
memory: 400Mi
enableAdminAPI: false
remoteWrite:
- url: "https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE-ID/api/v1/remote_write"
sigv4:
region: "REGION"
queueConfig:
capacity: 2500
maxShards: 200
maxSamplesPerSend: 1000
Where ACCOUNT-ID
is the AWS account ID, WORKSPACE-ID
is the Workspace ID of the workspace you created, and REGION
is the AWS Region where the Amazon EKS cluster was created.
To apply changes run the following command:
kubectl apply -f prometheus.yml
Again, I access the Prometheus web UI by running following command:
kubectl port-forward svc/prometheus-operated 9090:9090
4. Testing the configuration
The Prometheus web UI is visible in a browser at localhost:9090. I can see that a remote_write
URL has been added to the server configuration. See Figure 1.
Figure 1: Prometheus configuration has been updated with a remote_write URL.
I use awscurl
to query the Prometheus workspace to verify ingested data
awscurl --service "aps" --region "REGION" "WORKSPACE_QUERY_URL?query=http_requests_total"
Where REGION
is the region of the workspace and WORKSPACE_QUERY_URL
is the query URL endpoint of the workspace. The WORKSPACE_QUERY_URL
can be viewed on the Amazon Managed Service for Prometheus console for the workspace, or the URL can be created as follows:
https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE-ID/api/v1/query
Where REGION
is the region of the workspace and WORKSPACE-ID
is the workspace id of the workspace.
5. Configuring Alertmanager
Because Alertmanager and Prometheus rule functionality is built-in to Amazon Managed Service for Prometheus, any existing Alertmanager CRDs are no longer needed. Those CRDs can be removed from your cluster.
The ACK controller has a concept of a RuleGroupNamespace
(which is equivalent to a PrometheusRule
in the Prometheus Operator), and an AlertManagerDefinition
(which is equivalent to an AlertmanagerConfig
in the Prometheus Operator).
I can use RuleGroupNamespace
to create a new alerting rule. Replace the WORKSPACE-ID
with the Workspace ID of the workspace in following configuration.
apiVersion: prometheusservice.services.k8s.aws/v1alpha1
kind: RuleGroupsNamespace
metadata:
name: default-rule
spec:
workspaceID: WORKSPACE-ID
name: default-rule
configuration: |
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: (rate(http_request_duration_microseconds{handler="api"}[2m])/1000000) > 2
for: 5m
labels:
severity: warning
annotations:
summary: Host latency detected
I apply this to my cluster via kubectl
. After a few minutes, the rules will appear under the Rules management tab of the workspace. See figure 2.
Figure 2: Prometheus has been updated with a monitoring rule.
I can use an AlertManagerDefinition
to send an alerts to an Amazon Simple Notification Service (Amazon SNS) topic. Replace WORKSPACE-ID
with the Workspace ID, SNS-TOPIC-ARN
with the ARN of an Amazon SNS topic where you want to send the alerts, and REGION
with the current region of the workload. Make sure that your workspace has permissions to send messages to Amazon SNS.
apiVersion: prometheusservice.services.k8s.aws/v1alpha1
kind: AlertManagerDefinition
metadata:
name: alert-manager
spec:
workspaceID: WORKSPACE-ID
configuration: |
alertmanager_config: |
route:
receiver: default_receiver
receivers:
- name: default_receiver
sns_configs:
- topic_arn: SNS-TOPIC-ARN
sigv4:
region: REGION
message: |
alert_type: {{ .CommonLabels.alertname }}
event_type: {{ .CommonLabels.event_type }}
After a few minutes, the Alert manager configuration will appear under the Alert manager tab of your workspace. See figure 3.
Figure 3: The workspace Alert manager configuration has been updated with an alert manager configuration.
Once you have configured rules and alerts within your workspace, you can delete the Alertmanager
, AlertmanagerConfig
, and PrometheusRule
Prometheus Operator CRDs from your cluster, as they are no longer needed.
Next Steps
In this blog post I demonstrated the basics of the Prometheus Operator and I demonstrated how you can use this operator to begin to take advantage of using Amazon Managed Service for Prometheus, including Alertmanager. Using the steps I demonstrate in this blog, you can continue to use the management tools you’re familiar with for managing your workload while offloading the burden of managing your observability stack by migrating to Amazon Managed Service for Prometheus.
Alertmanager and rule management are built in to Amazon Managed Service for Prometheus. If you’re using the Prometheus Operator to manage Alertmanager or rules, you can simply delete those resources from your configuration and begin using the ACK controller for Amazon Managed Service for Prometheus. Like the Prometheus Operator, the ACK controller lets you take advantage of managing a Prometheus workspace by using CRDs, but it is optimized to create resources within an AWS account.
As a next step, take advantage of using an Amazon Managed Service for Prometheus workspace by configuring a remoteWrite URL as part of your Prometheus Operator configuration. You can also manage Alertmanager and rules by installing an ACK controller on your Amazon EKS cluster.
About the author: