Optimize compute resources on Amazon ECS with Predictive Scaling

This blog is co-authored by Jooyoung Kim, Senior Containers Specialist Solutions Architect, Abhishek Nautiyal, Senior Product Manager, Amazon ECS and Ankur Sethi, Senior Product Manager, Amazon EC2.

Introduction

Amazon Elastic Container Service (Amazon ECS) is an opinionated, easy-to-use container orchestration service with deep AWS integrations that streamlines the deployment and management of containerized applications at scale. A vital aspect of effective resource management within Amazon ECS is Service Auto Scaling. This feature allows users to automatically adjust the desired count of tasks in an Amazon ECS service, making sure that applications can efficiently meet fluctuating demands without manual intervention.

The Amazon ECS integration with Application Auto Scaling allows users to use target tracking or step scaling policies to automatically adjust the number of tasks of an Amazon ECS service based on an Amazon CloudWatch metric, such as average CPU usage, request count per target, or even a custom metric such as queue-depth. Users can also use scheduled scaling to define proactive scaling actions to increase or decrease task counts at specific times for workloads that see a recurring usage pattern.

Today, we’re excited to introduce Predictive Scaling, a new scaling policy within Amazon ECS Service Auto Scaling, designed to anticipate demand surges by using advanced machine learning (ML) algorithms. Predictive Scaling proactively increases the desired task count, making sure of improved availability and responsiveness for your applications, while also enabling cost savings by needing less over-provisioning. You can use Predictive Scaling alongside your existing auto scaling policies, such as target tracking or step scaling, thus your applications scale based on both real-time and historic patterns. In this post, we provide an overview of Predictive Scaling, illustrate a scenario where this feature proves beneficial, and guide you through the steps to configure a Predictive Scaling policy for your Amazon ECS service.

Predictive Scaling in Amazon ECS

Although target tracking and step scaling policies are effective in scaling your Amazon ECS services, as reactive methods these approaches activate after a demand change has been detected. Usually this doesn’t have many implications for gradual demand changes. However, during significant fluctuations, such as early morning spikes when business resumes, this reactive strategy can lead to delays before a scaling action is initiated and optimal resource usage is achieved. These delays can be elongated if your applications require extended initialization times, potentially resulting in temporary performance degradation during significant demand fluctuations.

As mitigation, some users over-provision their services by defining lower than ideal thresholds for the usage metric, so that the service has more tasks to handle surges in traffic, offsetting any delays in scaling-out tasks just-in-time. Users also use scheduled scaling to manually define scaling configurations based on anticipated demand patterns. However, effectively using scheduled scaling necessitates users manually identifying traffic patterns and the corresponding task count values. Furthermore, as traffic patterns evolve over time, users must revisit and adjust their scheduled actions accordingly. Therefore, a more efficient proactive scaling solution is necessary to better align resources with actual demand, thereby enhancing performance and reducing operational costs.

Predictive Scaling addresses this need by allowing users to scale applications in anticipation of demand spikes. This approach provides time for new task replicas to initialize before the demand occurs. Predictive Scaling continually learns the demand patterns specific to your application by using advanced ML algorithms trained on millions of data points. This provides increasingly accurate forecasts and a more customized scaling experience over time. This solution works alongside other reactive scaling policies, making sure of high availability by addressing both predicted and real-time demand changes. Predictive Scaling doesn’t trigger scale-ins based on predictions, which helps maintain the necessary capacity during unexpected demand surges. For scale-ins, reactive scaling policies or scheduled scaling must be employed. To be more precise, at any given time, you can expect the following scenarios to occur with Predictive Scaling enabled for your Amazon ECS service:

When multiple scaling policies such as Predictive Scaling and Target Tracking Scaling are active, each one independently estimates the desired number of tasks, and the overall desired count is determined by the maximum value among them.
If actual capacity is lower than the predicted capacity, then Amazon ECS Service Auto Scaling scales out your Amazon ECS service so that its desired capacity is equal to the predicted capacity.
If actual capacity is already higher than predicted capacity, then Amazon ECS Service Auto Scaling doesn’t scale-in your Amazon ECS service. This is to make sure that you always have enough capacity and not scale-in based on lower predictions. The service would be scaled-in when both predictive and reactive scaling policies, such as Target Tracking, estimate lower capacity.
If the predicted desired task count is outside of the range of the desired count of your tasks, and you have opted to override the maximum desired task count, then Amazon ECS Service Auto Scaling can add more tasks than your current maximum limit. However, if you haven’t opted to override the limit, then Amazon ECS Service Auto Scaling doesn’t exceed those boundaries.

Recommendations for using Predictive Scaling

Predictive Scaling is ideal for applications that have rapidly changing demand and follow a consistent pattern. These could be your user-facing applications with daily and weekly patterns of sudden demand bursts as your users return, or internal applications such as CI/CD tools that follow the business hour patterns. Typically, for these rapid demand spikes, reactive scaling mechanisms take multiple, sequential scaling actions, which delay when you reach the optimal usage level. This delay may get exacerbated if your application tasks take longer to complete the initialization steps, such as registering with load balancers, application bootstrapping, or data replication. Predictive Scaling also acts as a safety net against untimely or inappropriate scale-ins through reactive scaling policies that have context only of real-time metrics, which may not be aligned with an impending demand surge or may have been compromised by actions such as deployments or outages. To explore specific use cases, consider the following examples:

The workload follows a cyclical traffic pattern, characterized by heightened resource usage during regular business hours and a decline in usage during evenings and weekends. For this type of workload, setting the baseline to Predictive Scaling enables it to use ML algorithms to identify historical patterns and proactively manage traffic. Handling any spiky loads using reactive policies is recommended.
The workload exhibits recurring on-and-off patterns at specific time intervals, such as every two hours or more. While this baseline can be managed through scheduled scaling, it requires periodic data analysis and manual adjustments to cron job settings. Using Predictive Scaling, however, can automate this process, reducing the need for manual intervention and ensuring efficient scaling at the appropriate intervals.
The workload that involves long initialization times can make it difficult to respond to changes in demand using the reactive method. This latency can lead to delays in optimizing resource usage, particularly during pronounced demand shifts, such as the surge in activity at the start of the business day. For applications with numerous dependencies or substantial static assets, these initialization delays can be even longer. Therefore, for applications that exhibit specific time-based patterns and have lengthy initialization processes, setting up Predictive Scaling can enable a faster response to changes in load.

Getting started with Predictive Scaling

You can begin using Predictive Scaling without disrupting your current Auto Scaling behavior. Predictive Scaling policies operate in two modes: Forecast Only and Forecast And Scale. In Forecast Only mode, Predictive Scaling generates capacity forecasts without implementing any changes, allowing you to validate that it accurately anticipates your routine demand patterns. This mode is an excellent way to start with Predictive Scaling, as it makes sure, directly on the production environment, that your existing scaling behavior remains unaffected.

Furthermore, you can create multiple policies in Forecast Only mode to compare different configurations, such as forecasting based on various metrics. Once you have verified the accuracy of the predictions, transitioning to Forecast And Scale mode is a direct process: update the policy configuration that best suits your applications. When switched to this mode, Predictive Scaling actively makes scaling decisions, which enables your applications to respond effectively to anticipated demand spikes.

Walkthrough

In this section, having gained a thorough understanding of the Predictive Scaling feature, we now provide a detailed guide on the steps required to configure and set it up. A new service needs to provide at least 24 hours of data before a forecast can be generated. Furthermore, the effectiveness of the forecast improves with more historical data, ideally requiring at least two weeks’ worth.

Before using Predictive Scaling or Amazon ECS Service Auto Scaling, identify the appropriate usage metric and target value, such as CPU usage for compute-intensive applications. Start with load testing in a staging environment to determine optimal settings. For more details, see the Amazon ECS user guide. If target tracking or step scaling is already configured, use the same metric for Predictive Scaling and adjust the target value to balance performance and cost. You can also use Forecast Only mode to refine settings without affecting current configurations.

1) On the Amazon ECS console, you can now use the new Service auto scaling section on your ECS service to explore and configure all scaling policies. If you did not configure autoscaling when creating your service, you can enable service autoscaling from this section. In the walkthrough, we assume that target tracking is already set up with a target value of 60% CPU usage and explore how to configure Predictive Scaling based on that configuration. Click Create scaling policy to configure Predictive Scaling.

Figure 1. Amazon ECS Service auto scaling section

2) Below configuration uses the predefined metric type of CPU usage for the Amazon ECS service, which refers to average CPU usage. By setting the SchedulingBufferTime property to 600 seconds using additional settings, you can opt to initiate the scale-out operation 10 minutes ahead of time.

Figure 2. Create Predictive Scaling

Furthermore, it is advisable to initially configure the Forecast Only mode, allowing you to assess the accuracy and suitability of the forecast before full implementation.

Figure 3. Create Predictive Scaling with Forecast Only mode

Note: Predictive Scaling is not available in the Create and Update Service wizards. To configure Predictive Scaling, use the Service auto scaling section after creating an ECS service.

3)Since having at least two policies in Forecast Only mode allows you to evaluate a policy that uses one metric value against another that uses a different metric value, go back to Step 2 to configure Predictive Scaling with a different metric value. After repeating the steps, you can obtain the following configuration.

Figure 4. The list of scaling policies configured for the ECS service

4) After creating the policy, at the time of writing, Predictive Scaling reviews up to 14 days of historical data to generate hourly forecasts for the next 48 hours. These forecasts are updated every 6 hours with new CloudWatch data, improving accuracy over time.

5) After a certain period, the Recommendations tab highlights if adding a predictive scaling policy improves availability while maintaining or slightly increasing costs. You can use these insights to adjust to Forecast And Scale mode. If multiple policies exist in Forecast Only mode, the Best prediction tag identifies the one prioritizing availability at lower costs. You can change the policy tagged as Best prediction to Forecast and Scale mode.

Note: You need additional IAM permissions to access predictive scaling forecast and recommendations, specifically application-autoscaling:GetPredictiveScalingForecast action.

Figure 5. Predictive Scaling policy recommendations

6) Through the View Chart option, you can evaluate the accuracy of the prediction model by comparing historical data with predicted values.

Figure 6. Predictive Scaling’s Load and Capacity graph

7) As shown in the following snippet from the Amazon ECS Console, a combination of Predictive and Dynamic(target tracking) scaling policies enables you to effective scale your services. Predictive Scaling is used for baseline capacity, while Dynamic Scaling addresses additional capacity based on current usage. Amazon ECS calculates the recommended number of tasks for each non-scheduled scaling policy and scales according to the policy that provides the highest number of tasks.

Figure 7. Set Forecast and Scale with Best prediction

8)The Scaling Activities section provides a detailed view of predictive scaling activities, offering insights into how your policies are performing.

Figure 8. ECS service scaling activities

Conclusion

The new feature Predictive Scaling, combined with reactive scaling, allows you to make sure that your Amazon ECS service has the required number of tasks running to handle both predicted and real-time demand. This enables you to build highly available and responsive applications while making effective usage of your application’s resources to optimize cost.

Get started with Predictive Scaling by enabling the Forecast Only mode to gain visibility of the predicted capacity without actually taking any scaling actions. You can refine and tune your Predictive Scaling policies by choosing the right set of metrics and adjusting the targeted usage level. Once completed, you can switch to Forecast And Scale mode to proactively scale your Amazon ECS services based on predicted demand.

To learn more about the feature, refer the Amazon ECS Predictive Scaling Guide. You can use the Amazon ECS console, SDK, or CLI to configure Predictive Scaling for your Amazon ECS services. We encourage you to share your feedback and suggestions on the AWS container services public roadmap.

Containers