AWS Cloud Operations Blog

How Skai leveraged AWS Step Functions to enforce its tagging policies

Skai is an independent, global marketing platform for strategy, measurement, and best-of-breed activation across all of the world’s most influential digital channels. Skai’s solution provides data-driven insights and optimization technology to help companies make informed decisions and scale performance across critical publishers.

Skai possesses a highly technical engineering organization with over 350 software engineers, data experts, and DevOps engineers.

The challenge

As part of our migration to a cloud native architecture we are moving many of our workloads to AWS, that includes services like Amazon Elastic Compute Cloud (EC2) Amazon RDS, Amazon Elastic Kubernetes Service (EKS) and more.

As a standard we wanted to tag all of our cloud resources on creation time with pre-decided mandatory tags.

We chose to focus on the “Project” tag which is an important part of our cost optimization efforts.

Our dev culture allows most developers to provision different AWS resources, this resulted in an increase in our AWS spend. Without proper resource tagging it’s difficult to group different resources and to understand their business value.

With many different users and teams now utilizing our AWS cloud infrastructure we noticed more and more resources being created without the required mandatory tags.

And while we tried to increase the tagging coverage with different tools, like AWS Tag Editor and custom boto3 scripts, the untagged resources percentage was increasing rapidly.

We wanted to monitor the increase in untagged resources so we can see the top contributors (usually automations) and fix them before we implement the next phase of preventing these resources from being created using Service Control Policies (SCPs) policies.

Solution Overview

Our solution was divided into 2 phases:

  1. Phase 1 – identifying newly created resources without the “project” tag.
  2. Phase 2 – Preventing resources without the “project” tag from being created using Service Control Policies (SCPs).

In this blog post we will focus on Phase 1.

In order to gain visibility over resources that are created without our mandatory tags we leveraged AWS Config, specifically the AWS provided required-tags rule, AWS Step Functions, AWS Lambda, AWS DynamoDB and AWS Cloudtrail.

This could have also been achieved using a single monolithic AWS Lambda however using AWS Step Functions provides visibility of each step and an intuitive way of debugging, error handling and modularity using minimal code. The Figure below shows an overview of our solution:

Figure 1. High Level Archtcure

Figure 1. High Level Archtcure

We created an AWS Step Function that generates a report with a list of resources that were created in the past 24 hours without our mandatory tags and then sends this report to us via email.

This report allows us to quickly locate newly created untagged resources and to identify the creator of those resources. This makes the implementation of the second phase easier by blocking less resources from being created.

The following list is how we harnessed AWS services and how we configured the moving parts in our solution.

  1. AWS Config – Gathers non-compliant resources (resources without the “Project” tag present) using the AWS provided required-tags rule.
  2. AWS Lambda – A multi entry point function used in our step function. It provides the following functionality:
    1. getYesterdayDate –  Filters resources that were created in the past 24 hours.
    2. getLatestElement –  Enriches the information for each resource with UserAgent and CreatorArn.
    3. parseEmail –  Sends an email with a report of supported noncompliant resources that were created in the past 24 hours using the “required-tags” AWS Config rule.
  1. AWS DynamoDB –  Records every filtered resource (resource created in past 24 hours without mandatory tags) in a dedicated table. This table will be used later in the Lambda function that sends the report.
  2. AWS Step Functions – The backbone of our solution and is used to connect and orchestrate the different pieces and services used to achieve our goals.

AWS Cloudtrail – For each noncompliant resource we will query Cloudtrail to enrich our report with UserAgent and CreatorArn. This  helps guide us in the right direction when it comes to fixing the resources and ensuring they are created with the mandatory tags. For example, if we see that a specific resource was created using ‘CreatorArn’ as  EKS role and ‘UserAgent’ as ‘eks.amazonaws.com’  we may immediately know what and where we need to fix.

To avoid reaching AWS Step Function history quota we have separated our workflow into two step functions (https://docs.thinkwithwp.com/step-functions/latest/dg/bp-history-limit.html)

  1. Map step function – Iterates through the paginated evaluation result from AWS Config and for each page of the result it triggers our Map step function.
Figure 2. Main Step Function- Iterate through AWS Config report

Figure 2. Map Step Function- Iterate through AWS Config report

  1. Map step function – Iterates over noncompliant resources from  each page in the AWS Config rule evaluation.
Figure 3. Second Step Function- Iterate over non-compliant resources

Figure 3. Second Step Function- Iterate over non-compliant resources

According to Danny Zalkind, Director, Infrastructure Engineering at Skai – “StepFunctions has allowed us to quickly create a workflow process, integrated with multiple AWS services, in a low-code manner, which would otherwise take us much longer to achieve using custom logic or open source tools. This enabled us to ramp up our resource tagging enforcement, gaining improved cost control. We were able to grow our account cost in a controlled way, increasing only a small, signal digit percentage of cost, while in the midst of a large-scale cloud migration.”

Conclusion

Ensuring resources are created with mandatory tags is critical for ensuring compliance as well as controlling cost.
In this post we saw how by leveraging AWS Step Functions you can quickly and efficiently maintain a high percentage of tagged resources (> 85%) and track down untagged resource creators.

After implementing our solution and fixing all the findings we have since initialized the next phase of our tagging efforts by using AWS Service control policies that actively prevent resources without mandatory tags from being created in the first place.

About the author:

Roman Fainerman

Roman Fainerman is a senior DevOps engineer, tech lead at Skai and an AWS certified Professional DevOps Engineer.

Steve Mattar

Steve Mattar is a DevOps Architect at Skai.

Judith Lehner

Judith Lehner is a Technical Account Manager at AWS based in Israel, she also focuses on helping customers with Cost Optimization activities.

Eran Balan

Eran Balan is an AWS Senior Solutions Architect based in Israel. He works with digital native customers to provide them with architectural guidance for building scalable architecture in AWS environments.