AWS Cloud Operations Blog

Implement operations observability in landing zone environments

In an earlier blog post, Automate customized deployment of cross-account/cross-region CloudWatch dashboards using tags, we showed you how to implement Amazon CloudWatch dashboards for specific events with automation. This solution is great for seasonal events, holidays, important releases, and other use cases.

In this blog post, we will review a landing zone environment and share a solution that can help you improve operational visibility at scale while eliminating Day 2 repetitive configuration tasks and overhead. When working with governance at scale, resource tagging makes it easy to identify the application stack and provisioned resources for different use cases such as automation, cost optimization/visablity, and so on.

AppName is a common resource tag that’s used across business units. Its tag value is relevant to the designated application deployed in different AWS accounts in the landing zone. The solution described in this blog post will allow you to centralize observability functions in the central monitoring/operational AWS account and dynamically generate Amazon CloudWatch dashboards for each application stack based on the tagged resource values with the Lambda function triggered by Amazon EventBridge. The operational team will minimize the onboarding process of the new application stack for the different business units by adding the tag keys and values in AWS Systems Manager Parameter Store in a monitoring/operational AWS account.

Solution overview

In this solution, the CloudWatch dashboard resides in a monitoring account. We collect data from accounts referred to as X, Y, and Z. Our objective is to have the CloudWatch dashboard contain aggregate metrics from all member accounts in the landing zone. To provide per-application stack observability in the monitoring account, we are using cross-account dashboard functionality in Amazon CloudWatch, where aggregated data is presented from the defined member accounts. Any resources in the monitoring account can be included in the dashboard, too.

If you followed the steps in our previous blog post, you set up CloudWatch data sharing in accounts X, Y, and Z. You also set up CloudWatch in the monitoring account so you can view the shared data. Your application resources should be tagged with a tag key (for example, AppName) and a unique tag value (for example, AppX) in member accounts. The Lambda function used in this blog post supports the monitoring of the following AWS services and resources: Amazon EC2, Amazon RDS, AWS Lambda, Amazon ElastiCache, Classic Load Balancer, Application Load Balancer, Network Load Balancer. This solution uses resources (IAM roles and policies) configured in the previous blog post.

Operational or Monitoring Account where the Lambda function is deployed and there are 3 member accounts (X,Y,Z) and all accounts contains AWS resources. The Lambda function assumes a role and looks through resources within member accounts to capture the data that is required to output the CloudWatch Dashboard in the Operational or Monitoring Account.

Figure 1: Solution architecture

Solution steps and deployment

The solution architecture shows the following components and steps:

  1. In the monitoring account, in AWS Systems Manager Parameter Store, add four parameters for dashboard-prefixsearch-keysearch-regionssearch-values and cross-account-policy-name.
  2. Complete steps 1, 2, 4, and 5 in the Automate customized deployment of cross-account/cross-region CloudWatch Dashboards using tags blog post.
  3. Create an IAM policy for retrieving the AWS Systems Manager Parameter Store parameters.
  4. In the monitoring account, use the AWS Lambda console to create a Lambda function and associate IAM polices.
  5. Tag your AWS resources in each member account.

Step 1: In the monitoring account, add AWS Systems Manager Parameter Store parameters

  1. Sign in to the monitoring account.
  2. In the Systems Manager console, choose Application Management, and then choose Parameter Store.
  3. UnderParameter Store, choose Create Parameter.
  4. In Parameter details, enter the following to create the parameters.

Name:/$region/$environment/dashboard-prefix
Type: String
Value: example-dashboard

Name:/$region/$environment/search-key
Type: String
Value: AppName

Name:/$region/$environment/search-regions
Type: StringList
Value: us-east-1,us-west-2

Name:/$region/$environment/search-values
Type: StringList
Value: AppX,AppY,AppZ

Name:/$region/$environment/cross-account-policy-name
Type: String
Value: CrossAccountDashboardDiscoveryPolicy

The Create Parameter page showing all the details that are required to make the parameter.

Figure 2: Create parameter

After you create the parameters, use the My parameters tab to confirm that the correct parameter type (in this case, string) appears, as shown in Figure 3:

My parameters page showing the parameters you have created.

Figure 3: My parameters

Alternatively, you can use the AWS CLI to add AWS Systems Manager Parameter Store parameters:

aws ssm put-parameter --name "/us-east-1/dev/dashboard-prefix" --type String —value "example-dashboard"

aws ssm put-parameter --name "/us-east-1/dev/search-key" --type String —value "AppName"

aws ssm put-parameter --name "/us-east-1/dev/search-regions" --type StringList —value "us-east-1,us-west-2"

aws ssm put-parameter --name "/us-east-1/dev/search-values" --type StringList —value "AppX,AppY,AppZ"

aws ssm put-parameter --name "/us-east-1/dev/cross-account-policy-name" --type String —value "CrossAccountDashboardDiscoveryPolicy"

For more information, see Create a Systems Manager parameter (AWS CLI) in the AWS Systems Manager User Guide.


Step 2: Complete the steps from the earlier blog post

Complete the following steps in the Automate customized deployment of cross-account/cross-region CloudWatch dashboards using tags blog post:

Step 1: In accounts X, Y, and Z, set up cross-account functionality in CloudWatch to share data with the monitoring account

Step 2: In the monitoring account, set up cross-account functionality in CloudWatch to access the shared data from accounts X, Y, and Z

Step 4: In accounts X, Y, and Z, create the AllowMonitoringAccountAccess role to provide access to the monitoring account

Step 5: Create CrossAccountDashboardDiscoveryPolicy, CloudWatchDashboardCustomPolicy, and IAMCustomPolicy in the monitoring account


Step 3: Create an IAM policy for retrieving the AWS Systems Manager Parameter Store parameters

Create an IAM policy in the monitoring account that will allow the Lambda function to retrieve values from AWS Systems Manager Parameter Store.

To create the GetCloudWatchDashboardCreationParametersFromSSM policy:

  1. Sign in to the monitoring account.
  2. In the IAM console, choose Policies, and then choose Create policy.
  3. Choose the JSON
  4. Replace the default statement with the following policy. Update the $environment, $region, and $monitoring_account_number variables to match your environment.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ssm:GetParameter",
                "ssm:GetParameters"
            ],
            "Resource": "arn:aws:ssm:$region:$monitoring_account_number:parameter/$region/$environment/*",
            "Effect": "Allow"
        }
    ]
}
  1. Choose Review Policy.
  2. On the Review policy page, enter a name (for example, GetCloudWatchDashboardCreationParametersFromSSM)and an optional description, and then choose Create policy.

For more information, see Creating IAM policies in the IAM User Guide.


Step 4: Create a Lambda function, update the IAM policy for the function, and add environment variables in the monitoring account

  1. Sign in to the monitoring account.
  2. In the AWS Lambda console, choose Functions, and then choose Create a function.
  3. Leave Author from scratch For Function name, enter AutomateLandingZoneDashboards. For Runtime, choose Python 3.8.
  4. Expand Change default execution role, make a note of the IAM role that will be created for this Lambda function (for example, AutomateLandingZoneDashboards-role-unlbygt9), and then choose Create function.
  5. On the Configuration tab, choose Edit. For Timeout, enter 15 seconds, and then choose Save.
  6. Copy and paste the content of the cw-automatelandingzonedashboard.py file in GitHub. Edit line 11 in the Lambda function code. Change the pathprefix value to reflect the AWS Systems Manager Parameter Store path that will be used for the lookup of parameters and values (example, /us-east-1/dev/), and choose Deploy.
  7. Go to the IAM console, update the IAM role created by the Lambda function (for example, AutomateLandingZoneDashboards-role-unlbygt9), and then attach the following IAM polices:
    • CrossAccountDashboardDiscoveryPolicy
    • CloudWatchDashboardCustomPolicy
    • IAMCustomPolicy
    • ResourceGroupsandTagEditorReadOnlyAccess
    • GetCloudWatchDashboardCreationParametersFromSSM

Note: The AWSLambdaBasicExecutionRole-**** managed policy will already be attached to this role.

  1. Go back to the AWS Lambda console and choose Lambda function. Choose AutomateLandingZoneDashboards, and then choose Test.
  2. For Configure test event, enter a name for the event, and then choose Create.

Note: The Lambda function looks for resources in us-east-1 and us-west-2.

For more information, see Create a Lambda function with the console in the AWS Lambda Developer Guide.


Step 5: Tag your AWS resources

  1. From each member account, obtain the resource tag that is governing the resources relevant to particular application stack.
  2. For the tag key, use AppName. For the tag value, use AppX.

For more information, see Tagging AWS resources in the AWS General Reference.


Step 6: Configure or update EventBridge in the monitoring account

In the monitoring account where the Lambda function is located, add a trigger for EventBridge (CloudWatch Events) to make the Lambda function run every 5 minutes. If you add or remove tags, the CloudWatch dashboard will be automatically updated at regular intervals. You can customize the trigger time to your requirements.

  1. Sign in to the monitoring account.
  2. In the AWS Lambda console, choose Functions, and then choose AutomateLandingZoneDashboards.
  3. In the Designer section, choose Add trigger, and then choose a trigger of EventBridge (CloudWatch Events).
  4. Under Rule, choose Create a new rule.
  5. For Rule name, enter EventBridgeAutomateLandingZoneDashboards. For Rule type, choose Schedule expression. You can enter the expression that best fits your use case. In this post, we use every 5 minutes.

For more information, see Schedule AWS Lambda Functions Using EventBridge in the Amazon EventBridge User Guide.

After the Lambda function runs and identifies each app, it creates a dashboard for each app. The dashboards are displayed in Custom Dashboards, as shown in Figure 4.

Example of AWS CloudWatch Dashboard showing the different dashboards that were created per application.

Figure 4: Custom dashboards

After the solution has been deployed and all application stacks and resources we want to monitor have been tagged, here are some examples of the CloudWatch dashboards:

Example dashboard showing metrics being collected by resources.

Figure 5: CloudWatch dashboard example

Second example dashboard showing metrics being collected by resources.

Figure 6: CloudWatch dashboard alt example

Conclusion

In this blog post, we showed you how to create the IAM policy that is required to retrieve details from the AWS Systems Manager Parameter Store to dynamically generate Amazon CloudWatch dashboards. This solution solves the problem of manually managing and updating a CloudWatch dashboards for application stacks in your landing zone. By using tags and the automation of EventBridge and Lambda, you can achieve observability automation at scale.

About the authors

Salman Ahmed

Salman Ahmed

Salman Ahmed is a Senior Technical Account Manager in AWS Enterprise Support. He enjoys working with customers to help them with design, implement, and support cloud infrastructure. He also has a passion for networking services. Salman uses his more than 10 years of experience to help customers with the adoption of AWS Transit Gateway and AWS Direct Connect.

Mike Gomez

Mike Gomez

Mike Gomez is an Enterprise Support Lead in AWS Enterprise Support who helps customers achieve and maintain operational excellence. Mike has a passion for Reliability Engineering and IT Operations. With a background in Travel and Hospitality, Media and Entertainment, and Banking, he helps customers achieve their business goals through cross-industry innovation.

Nikola Bravo

Nikola Bravo

Nikola Bravo is a Senior Cloud Architect in the AWS Professional Services, Private Equity team. With more than 15 years of working with Fortune 100 companies, hedge funds, banking, financial and private equity firms, Nikola is focused on delivering measurable business value through transformation, modernization, and digital innovation while successfully implementing large-scale, complex, and mission-critical strategic initiatives.

Andy Cracchiolo

Andy Cracchiolo

Andy Cracchiolo is a Cloud Infrastructure Architect with the AWS Professional Services team. With more than 15 years in IT infrastructure, Andy is an accomplished and results-driven IT professional. In addition to optimizing IT infrastructure, operations, and automation, Andy has a proven track record of analyzing IT operations, identifying inconsistencies, and implementing process enhancements that increase efficiency, reduce costs, and increase profits.