AWS Open Source Blog
Using AWS Distro for OpenTelemetry Collector for cross-account metrics collection on Amazon ECS
In November 2020, we announced OpenTelemetry support on AWS with AWS Distro for OpenTelemetry (ADOT), a secure, production-ready, AWS-supported distribution of the Cloud Native Computing Foundation (CNCF) OpenTelemetry project. With ADOT, you can instrument applications to send correlated metrics and traces to multiple AWS solutions, such as our Amazon Managed Service for Prometheus (AMP) and Partner monitoring solutions.
Many customers have their applications running on separate AWS accounts—and even separate AWS Regions—and would like to have a central place for observability. In a previous article, we explained how to collect metrics across multiple accounts with Amazon Elastic Kubernetes Service (Amazon EKS). The scenario will be similar, except, in this one, we use the ADOT agent to collect application and platform metrics for workloads running on Amazon Elastic Container Service (Amazon ECS), our native container orchestration platform to an AMP workspace.
Setup overview
To resolve this challenge, we will use the following structure.
On the workload accounts:
- Create an IAM role to be used by Amazon ECS tasks.
On the central monitoring account:
- Create an AMP workspace.
- Create an IAM role that allows cross-account access to AMP.
On the workload accounts:
- Create Amazon ECS tasks permissions to assume a cross-account IAM role.
- Set up the application and the AWS Distro for OpenTelemetry agent.
- Create an Amazon ECS cluster and run the application.
On the central monitoring account:
- Visualize metrics with Amazon Managed Grafana.
The entire architecture looks like the following:
Workload account: ECS role setup
Logged into the workload account, we create an IAM role that will be used later by Amazon ECS tasks. This role then will be trusted on the central monitoring account and granted assume-role permissions.
cat > task-assume-role.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
aws iam create-role --role-name ecs-xaccount-task-role \
--assume-role-policy-document file://task-assume-role.json \
--region eu-west-1
Monitoring account setup
Logged into the workload account, we create an AMP workspace with the following command with awscli:
aws amp create-workspace --alias ecs-xaccount-metrics-demo --region eu-west-1
Alternatively, we can use the AWS console and navigate to the AMP service.
We now can create an IAM role with write permissions to the AMP workspace. To grant multiple accounts, populate the "AWS"
array with appropriate IAM role ARNs:
WORKLOAD_ACCOUNT_ID=
cat > policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::$WORKLOAD_ACCOUNT_ID:role/ecs-xaccount-task-role"
]
},
"Action": "sts:AssumeRole",
"Condition": {}
}
]
}
EOF
# Note: You might encounter an error if the ecs-xaccount-task-role
# does not exists in the workload account.
aws iam create-role \
--role-name ECS-AMP-Central-Role \
--assume-role-policy-document file://policy.json \
--query 'Role.RoleName' \
--output text
aws iam attach-role-policy --role-name ECS-AMP-Central-Role \
--policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess
Workload account
Note: You can repeat instructions in this section for as many workload accounts as needed.
Logged into the workload account, we grant assumeRole
permissions to the role created previously:
# Set the central account id
CENTRAL_ACCOUNT_ID=
cat > policy.json <<EOF
{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Action":[
"sts:AssumeRole"
],
"Resource":"arn:aws:iam::${CENTRAL_ACCOUNT_ID}:role/ECS-AMP-Central-Role"
}
]
}
EOF
POLICY_ARN=$(aws iam create-policy --policy-name xaccount-amp-write \
--policy-document file://policy.json | jq -r '.Policy.Arn')
aws iam attach-role-policy --role-name ecs-xaccount-task-role \
--policy-arn $POLICY_ARN
Workload configuration
Next, we set up a sample application that exposes Prometheus metrics:
- Configure the
aws-otel-collector
to scrape the application and ECS metrics. - Build Docker images and host them on Amazon Elastic Container Registry (Amazon ECR).
- Configure, create an Amazon ECS cluster, and run everything using
ecs-cli
.
The layout should be organized as follows:
├── aws-otel-collector
│ ├── Dockerfile
│ └── config.yaml
├── demo-app
│ ├── Dockerfile
│ └── main.go
├── docker-compose.yml
└── ecs-params.yml
To set up Amazon ECS, we need Docker and ecs-cli as requirements. On Linux, ecs-cli
can be installed like this:
sudo curl -Lo /usr/local/bin/ecs-cli https://amazon-ecs-cli.s3.amazonaws.com/ecs-cli-linux-amd64-latest
Now, let’s create the sample application that exposes a /metrics
Prometheus endpoint:
mkdir demo-app
cd demo-app/
cat > main.go <<EOF
package main
import (
"github.com/prometheus/client_golang/prometheus/promhttp"
"net/http"
)
func main() {
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":8000", nil)
}
EOF
This will create a Dockerfile for the application:
cat > Dockerfile <<EOF
FROM golang:1.18 as builder
WORKDIR /go/src/app
COPY . .
RUN go mod init demo
RUN go get .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
WORKDIR /app
RUN apk --no-cache add ca-certificates
COPY --from=builder /go/src/app/app .
EXPOSE 8000
CMD ["./app"]
EOF
And finally, the following script will create an ECR repository, build the application image, and push the image to Amazon ECR:
APP_REPOSITORY=$(aws ecr create-repository --repository demo-app --query repository.repositoryUri --output text)
docker build . -t demo-app
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin $APP_REPOSITORY
docker tag demo-app:latest $APP_REPOSITORY
docker push $APP_REPOSITORY
cd -
Now, let’s configure the AWS Distro for OpenTelemetry Collector. We will create a custom configuration to collect data called a Pipeline. A Pipeline defines a path the data follows in the collector starting from reception, then further processing or modification, and finally exiting the collector via exporters.
We will collect from the application with the /metrics
endpoint and make use of the ecs-metrics-receiver
to scrape various ECS task metadata from the ECS task metadata endpoint. Visit the documentation to learn more about ecs-metrics-receiver and other configuration options.
We will export collected metrics to the AMP workspace created on the monitoring account using awsprometheusremotewrite
exporters configuration. We will provide both the AMP remote_write
endpoint and the IAM role to assume—in our case, ECS-AMP-Central-Role
.
Edit the WORKSPACE_ID
and CENTRAL_ACCOUNT_ID
variables and run the following script to create the pipeline:
WORKSPACE_ID=
CENTRAL_ACCOUNT_ID=
mkdir aws-otel-collector
cd aws-otel-collector
cat > config.yaml <<EOF
receivers:
prometheus:
config:
global:
scrape_interval: 15s
scrape_timeout: 10s
scrape_configs:
- job_name: "prometheus-demo-app"
static_configs:
- targets: [ 0.0.0.0:8000 ]
awsecscontainermetrics:
collection_interval: 20s
processors:
filter:
metrics:
include:
match_type: strict
metric_names:
- ecs.task.memory.utilized
- ecs.task.memory.reserved
- ecs.task.cpu.utilized
- ecs.task.cpu.reserved
- ecs.task.network.rate.rx
- ecs.task.network.rate.tx
- ecs.task.storage.read_bytes
- ecs.task.storage.write_bytes
exporters:
prometheusremotewrite:
endpoint: https://aps-workspaces.eu-west-1.amazonaws.com/workspaces/$WORKSPACE_ID/api/v1/remote_write
auth:
authenticator: sigv4auth
logging:
loglevel: debug
extensions:
sigv4auth:
service: "aps"
assume_role:
arn: arn:aws:iam::$CENTRAL_ACCOUNT_ID:role/ECS-AMP-Central-Role
sts_region: us-west-2
service:
extensions: [sigv4auth]
pipelines:
metrics:
receivers: [prometheus]
exporters: [logging, prometheusremotewrite]
metrics/ecs:
receivers: [awsecscontainermetrics]
processors: [filter]
exporters: [logging, prometheusremotewrite]
EOF
From the latest version of the aws-otel-collector
, create a custom image on Amazon ECR with our custom configuration:
cat > Dockerfile <<EOF
FROM public.ecr.aws/aws-observability/aws-otel-collector:latest
COPY config.yaml /etc/ecs/otel-config.yaml
CMD ["--config=/etc/ecs/otel-config.yaml"]
EOF
Finally, build and push the image:
COLLECTOR_REPOSITORY=$(aws ecr create-repository --repository aws-otel-collector --query repository.repositoryUri --output text)
docker build . -t aws-otel-collector
aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin $COLLECTOR_REPOSITORY
docker tag aws-otel-collector:latest $COLLECTOR_REPOSITORY
docker push $COLLECTOR_REPOSITORY
cd -
Run application: Set up Amazon ECS
Amazon ECS needs an execution role
—a set of permissions to run our tasks. Run the following script to create it:
cat > task-execution-assume-role.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
aws iam create-role --role-name ecs-xaccount-task-execution-role \
--assume-role-policy-document file://task-execution-assume-role.json \
--region eu-west-1
aws iam --region eu-west-1 attach-role-policy --role-name ecs-xaccount-task-execution-role \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
Set up the WORKLOAD_ACCOUNT_ID
variable and run the following script to create a docker-compose
file:
WORKLOAD_ACCOUNT_ID=
cat > docker-compose.yml <<EOF
version: "3"
services:
aws-otel-collector:
image: $WORKLOAD_ACCOUNT_ID.dkr.ecr.eu-west-1.amazonaws.com/aws-otel-collector:latest
environment:
- AWS_REGION=eu-west-1
logging:
driver: awslogs
options:
awslogs-group: ecs-xaccount-metrics-demo
awslogs-region: eu-west-1
awslogs-stream-prefix: aws-otel-collector
prometheus-demo-app:
image: $WORKLOAD_ACCOUNT_ID.dkr.ecr.eu-west-1.amazonaws.com/demo-app
ports:
- "8000:8000"
depends_on:
- aws-otel-collector
logging:
driver: awslogs
options:
awslogs-group: ecs-xaccount-metrics-demo
awslogs-region: eu-west-1
awslogs-stream-prefix: demo-app
EOF
Using ecs-cli
, we will create an Amazon ECS cluster:
ecs-cli configure --cluster ecs-xaccount-metrics-demo \
--default-launch-type FARGATE \
--config-name ecs-xaccount-metrics-demo \
--region eu-west-1
ecs-cli up --cluster-config ecs-xaccount-metrics-demo
After few minutes, the cluster should be created with all necessary associated resources. Select the VPC_ID
from the preceding command and get the default security group associated to the VPC:
VPC_ID=
aws ec2 describe-security-groups --filters Name=vpc-id,Values=$VPC_ID \
--region eu-west-1 \
--query SecurityGroups[0].GroupId \
--output text
Edit the ecs-params.yml
file needed by ecs-cli
, and replace the subnet IDs and security group from the previous outputs:
version: 1
task_definition:
ecs_network_mode: awsvpc
task_role_arn: ecs-xaccount-task-role
task_execution_role: ecs-xaccount-task-execution-role
task_size:
mem_limit: 0.5GB
cpu_limit: 256
run_params:
network_configuration:
awsvpc_configuration:
subnets:
- "subnet-"
- "subnet-"
security_groups:
- "sg-"
assign_public_ip: ENABLED
Finally, run the following script to deploy the application:
ecs-cli compose --project-name ecs-xaccount-metrics-demo \
service up \
--cluster-config ecs-xaccount-metrics-demo \
--create-log-groups
After few minutes, the Amazon ECS service should be up and running. You can verify the logs of the aws-otel-collector
on the Amazon CloudWatch Logs console, with the log group ecs-xaccount-metrics-demo
.
Monitoring account: Visualize metrics
Back in the monitoring account, let’s visualize our metrics using an Amazon Managed Grafana workspace. Refer to the documentation to set up Amazon Managed Grafana.
We can view metrics coming from the application endpoint:
And the Amazon ECS cluster metrics:
Clean up
Workload account
WORKLOAD_ACCOUNT_ID=
# stop and deletes ecs service
ecs-cli compose --project-name ecs-xaccount-metrics-demo service down --cluster-config ecs-xaccount-metrics-demo
# delete ecs cluster
ecs-cli down --cluster-config ecs-xaccount-metrics-demo
# delete task role
aws iam detach-role-policy --role-name ecs-xaccount-task-role --policy-arn arn:aws:iam::$WORKLOAD_ACCOUNT_ID:policy/xaccount-amp-write
aws iam delete-policy --policy-arn arn:aws:iam::$WORKLOAD_ACCOUNT_ID:policy/xaccount-amp-write
aws iam delete-role --role-name ecs-xaccount-task-role
# delete task execution role
aws iam detach-role-policy --role-name ecs-xaccount-task-execution-role --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
aws iam delete-role --role-name ecs-xaccount-task-execution-role
Central account
WORKSPACE_ID=
# delete role
aws iam detach-role-policy --role-name ECS-AMP-Central-Role --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess
aws iam delete-role --role-name ECS-AMP-Central-Role
# delete workspace
aws amp delete-workspace --workspace-id $WORKSPACE_ID
Conclusion
In this post, we explained how to use the AWS Distro for OpenTelemetry (ADOT) agent to collect application and platform metrics for workloads running on Amazon ECS.
You can use ADOT on other platforms, such as Amazon EKS, Amazon Elastic Compute Cloud (Amazon EC2), or on-premises. Additionally, you can use ADOT to collect distributed traces data and have multiple heterogeneous workload accounts sending metrics centrally to AMP and other platforms. Also, you can set up private connectivity with VPC endpoints and VPC peering, according to your needs.
Visit the ADOT, AMP, and Amazon Managed Grafana sites to learn more.