Enhancing resource sharing with AWS Outposts

The Amazon Web Services (AWS) service, AWS Outposts, is a game-changer for public sector organizations, offering a unique blend of cloud capabilities and on-premises control. It enables government agencies, educational institutions, and healthcare providers to modernize their IT infrastructure while adhering to strict data residency, security, and compliance requirements. With Outposts, public sector organizations can use AWS services locally, allowing low-latency performance for critical applications and physical control over sensitive data.

Outposts already provides robust capabilities for resource sharing in multi-account environments. Organizations can share Outposts and their associated resources across multiple AWS accounts in the same organization in AWS Organizations using AWS Resource Access Manager (AWS RAM). For more information about the native functionalities in Outposts and best practices in multi-account environments, refer to Sharing AWS Outposts in a multi account AWS environment: Part 1 and Sharing AWS Outposts in a multi account AWS environment: Part 2.

The built-in sharing capabilities provide a solid foundation for multi-account resource management. However, as the adoption of Outposts grows, so do customer expectations and their use cases become more complex. Additional control and monitoring capabilities may be needed.

AWS received valuable feedback from customers in federal IT, higher education, and digital services. These organizations expressed a need to enhance resource sharing capabilities in AWS Outposts. Specifically, they asked for the ability to do the following:

Monitor and break down resource usage across multiple consumer accounts on the same Outpost.
Implement constraints and quotas for each consumer account to prevent a single consumer from monopolizing available capacity.
Implement usage-based billing at the consumer account level.

To address these needs, AWS took a customer-centric approach, working backwards from their needs to develop a prototype solution. The prototyping and cloud engineering (PACE) team spearheaded this effort. PACE is a group of technologists, researchers, and solution builders who specialize in creating innovative solutions aligned with customer needs. The PACE team’s mission is to accelerate cloud adoption by helping customers envision the art of the possible for solving complex business challenges. Their expertise spans artificial intelligence (AI), machine learning (ML), deep learning, Internet of Things (IoT), natural language understanding, robotics, and augmented and virtual reality.

In this post, we explore the prototype developed by the PACE team using AWS services, open source software, and bespoke source code to address the resource sharing enhancements requested by public sector customers. This solution aims to provide greater flexibility, control, and cost management for organizations leveraging Outposts in multi-account environments, specifically focusing on key AWS resources, including Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Block Store (Amazon EBS), and Amazon Simple Storage Service (Amazon S3) on Outposts.

Figure 1 provides a high-level overview of the prototype, illustrating how it integrates with AWS accounts and manages resources across the owner and consumer Outposts accounts.

Figure 1. High-level overview of the prototype described in this post. The major components are AWS Outposts, Amazon Simple Notification Service (Amazon SNS), Amazon CloudWatch, and AWS Identity and Access Management (IAM).

Prototype interface overview

Let us start with the user-facing elements of the prototype. This interface overview demonstrates how administrators interact with the system, manage resources, and monitor usage across multiple accounts. Through a series of intuitive dashboards and control panels, users can effectively manage resource allocation, set thresholds, and track utilization.

Central management interface

The central management interface serves as the central control point, featuring two main sections, as shown in Figure 2.

The resources portal allows users to configure and manage resources at both Outposts and consumer accounts levels.
The dashboards portal provides access to comprehensive monitoring dashboards that display detailed metrics and usage patterns at both Outposts and consumer accounts levels.

Figure 2. Central management interface.

Resources portal

As part of the central management interface, the resources portal provides administrators with management capabilities for Outposts and consumer account resources. Through this portal, administrators can set and manage soft and hard limits for using resources.

When a consumer’s resource usage within the Outpost breaches the soft threshold, the owner account sends an email to the Outposts owner. When usage breaches the hard threshold, Outposts prevents the consumer from creating new resources of that type. When usage falls below the hard threshold, the consumer’s ability to create new resources is automatically restored, unless restricted by Outpost level thresholds.

Figure 3 shows the thresholds for resource type on the Outpost and availability, for Amazon EC2.

Figure 3. Threshold for Amazon EC2 resource on Outpost level.

Figure 4 shows an example of a situation in which the Soft Threshold is modified to 5.

Figure 4. Example showing changing the Soft Threshold to 5.

Dashboards portal

As part of the central management interface, the dashboards portal provides detailed monitoring capabilities for every consumer account, along with an aggregated dashboard for the Outposts, as shown in figure 5.

Figure 5. Dashboards portal.

Dashboards are also accessible from the Amazon CloudWatch console under Custom dashboards, as shown in figure 6.

Figure 6. Access to Dashboards from AWS CloudWatch console.

Focusing on the aggregated Outposts dashboard, you have access to the following metrics.

Connected status

The connected status metric, which reflects the Outposts connectivity with its parent AWS Region, is displayed as line and gauge graphs, as shown in figure 7. If the value drops below 1, it signals that there is impaired connectivity, which triggers an alarm for immediate action.

Figure 7. Connected status metric.

Amazon S3 utilization

As shown in figure 8, Amazon S3 utilization is displayed by each consumer account, along with the remaining available resources within the Outposts. It also shows the rate of change in the total number of bytes used within the Outposts.

Figure 8. Amazon S3 on outpost utilization.

Amazon EBS utilization

As shown in figure 9, Amazon EBS utilization and the remaining available resources within the Outposts are displayed, along with the rate of change in the total amount of gigabytes used in the Outposts.

Figure 9. Amazon EBS total utilization.

Amazon EC2 availability

The Amazon EC2 availability and rate of change for each instance type in the Outposts are displayed, along with a usage breakdown by the following:

Consumer account
Amazon Relational Database Service (Amazon RDS)
Ghost services, which are AWS managed services that consume Amazon EC2 capacity but aren’t visible as standard Amazon EC2 instances in your accounts, such as Elastic Load Balancing (ELB)

The ghost services utilization is calculated by subtracting both consumer accounts and Amazon RDS usage from the total Amazon EC2 capacity utilization within the Outposts.

Figure 10 shows a view of Amazon EC2 resources in Outposts, displaying instance type availability percentages, rate of change, and utilization by ghost services across different instance families.

Figure 10. View of Amazon EC2 instance type availability, rate of change, and utilization by ghost services.

Figure 11 shows the Amazon EC2 instance type count and capacity utilization metrics, displaying both available capacity and actual utilization percentages for different instance families (M5 and R5), tracked over multiple time periods.

Figure 11. Example showing aggregated utilization and availability for Amazon EC2 M5 and R5 instances.

Figure 12 shows the utilization at the instance type level, presented at the consumer and Amazon RDS level.

Figure 12. Example Amazon EC2 utilization at the consumer level.

Figure 13 shows consumer and Amazon RDS utilization at the instance type level. It is presented as a single graph.

Figure 13. Example showing Consumer and Amazon RDS utilization in a single graph.

Resource usage time

The resource usage time in minutes, displaying both Amazon S3 usage (in byte-minutes) and Amazon EC2 instance usage time across different instance types, as shown in figure 14, provides metrics that can be used for cost calculations.

Figure 14. Resources usage time in minutes.

Prototype overview

Having explored how administrators interact with the solution through its user interface, let’s examine the underlying technical architecture that powers these capabilities.

The technical implementation of this prototype operates across multiple AWS accounts. Owner account owns the Outpost. Consumer accounts share resources on the Outpost.

The solution deploys various AWS services across these accounts to create an integrated system for resource monitoring, control, and intervention. Figure 15 illustrates this architecture.

Figure 15. Architecture diagram of the solution described in this post. The major components are IAM, Amazon API Gateway, AWS WAF, Amazon SNS, Amazon EventBridge, AWS CloudTrail, Amazon CloudWatch, Amazon S3, Amazon EBS, Amazon EC2, and Outposts.

Let’s explore each of these components in detail to understand how they work together.

Resource usage control

You can set soft and hard usage thresholds to Amazon EC2, Amazon EBS, and Amazon S3 running on the Outposts.

The following thresholds are supported at the Outposts level:

Total number of running Amazon EC2 instances for each instance type
Total volume of Amazon EBS for each storage type
Total volume of Amazon S3

The following thresholds are supported for each consumer level:

Number of running Amazon EC2 instance for each instance type
Amazon S3 volume

Resource monitoring

The resource monitoring component of the prototype continuously monitors resource usage at both the Outposts and consumer account levels. It compares current usage against predefined thresholds for each resource type.

Resource monitoring uses two approaches to monitor the Outposts resources: Amazon CloudWatch alarms and AWS service events.

Amazon CloudWatch alarms monitor Amazon S3 and Amazon EBS usage. The solutions sets up alarms at both the Outposts and the consumer account levels that are triggered when a predefined threshold is reached. These alerts generate events that initiate interventions when necessary.

It’s important to understand that CloudWatch metrics are aggregated and may have a delay of several minutes. As a result, interventions based on these metrics might occur after usage has already surpassed the set thresholds.

AWS service events for Amazon EC2 monitoring are generated when an Amazon EC2 instance changes state, such as transitioning to a pending status, automatically triggering events in Amazon EventBridge and routing them to the Outposts owner account.

This approach allows for near real-time interventions, significantly reducing the chance of consumer accounts exceeding their allocated limits.

When usage meets or exceeds a threshold, the resource monitoring component generates an in-alarm message. Conversely, when usage drops below the threshold, an out-of-alarm message is created. These alarm messages are processed in the owner account and pushed to an Amazon Simple Notification Service (Amazon SNS) topic.

Resource usage intervention

The resource usage intervention component subscribes to the Amazon SNS topic, allowing it to implement an intelligent intervention system to manage resource usage across consumer accounts. This system operates through two primary mechanisms: permission management through AWS Identity and Access Management (IAM) and resource termination (for Amazon EC2 only).

Permission management through IAM

The core of the intervention strategy uses IAM to dynamically adjust permissions. We assume each consumer account accesses resources through a well-defined IAM role. If this role isn’t specified, the solution creates one in the consumer account. When thresholds are breached, we modify these IAM roles to restrict resource creation.

This is implemented using an intervention handler Lambda function that subscribes to the Amazon SNS topic receiving events from the events system. The function responds to two types of messages: in-alarm messages and out-of-alarm messages.

In-alarm messages (indicate a threshold breach): The function attaches a deny-resource-creation policy to the affected consumer accounts’ IAM roles.

async function denyAction(tenantAccount: string, policyName: string) {
const iam = await clientFactory.forTenantAccount(
    IAMClient,
  tenantAccount,
    ResourceNames.interventionRoleName(config.namespace),
);

const PolicyArn = ResourceNames.policyARN(clientFactory.partition, tenantAccount, policyName);
await iam.send(
    new AttachRolePolicyCommand({
      RoleName: config.tenantRoleName,
      PolicyArn,
    }),
);
}

Out-of-alarm messages: The function removes the deny policy, restoring normal permissions.

async function resumeAction(tenantAccount: string, policyName: string) {
const iam = await clientFactory.forTenantAccount(
    IAMClient,
    tenantAccount,
    ResourceNames.interventionRoleName(config.namespace),
);
const PolicyArn = ResourceNames.policyARN(clientFactory.partition, tenantAccount, policyName);

try {
    await iam.send(
      new DetachRolePolicyCommand({
        RoleName: config.tenantRoleName,
        PolicyArn,
      }),
    );
} catch (err) {
    if ((err as NoSuchEntityException).name === 'NoSuchEntityException') {
      logger.debug('Policy was already detached'); // not an error, is expected to happen
    } else {
      throw err;
    }
}
}

Resource termination (for Amazon EC2 only)

In addition to permission management, the solution can actively manage resources. For Amazon EC2, when instance counts exceed thresholds, we can end the most recently launched instances still in the pending state.

async function terminateEc2Instances(tenantAccount: string, instanceType: string, hardLimit: number) {
const commonLogFields = {
    feature: 'EC2 Intervention',
    tenantAccount,
    instanceType,
    hardLimit,
};
logger.appendKeys(commonLogFields);

logger.debug(`Starting enforcement`);

if (config.ec2InterventionMode === 'none') {
logger.debug(`Intervention mode is set to "none", taking no action`);
return;
}

const ec2 = await clientFactory.forTenantAccount(
    EC2Client,
    tenantAccount,
    ResourceNames.interventionRoleName(config.namespace),
);

const instances: Instance[] = [];

for await (const page of paginateDescribeInstances(
    { client: ec2 },
    {
      Filters: [
        {
          Name: 'instance-state-name',
          Values: [InstanceStateName.pending, InstanceStateName.running],
        },
        {
          Name: 'instance-type',
          Values: [instanceType],
        },
      ],
    },
)) {
    const pageInstances = (page.Reservations?.flatMap((reservation) => reservation.Instances) ?? []) as Instance[];
    instances.push(...pageInstances);
}

const totalRunning = instances.length;

if (totalRunning <= hardLimit) {
logger.debug(`Running instances are within the limit. Taking no action.`, { totalRunning });

return;
}

const instancesToStopCount = totalRunning - hardLimit;

if (instancesToStopCount > 0) {
    // find the most recent N instances to stop
    const instancesToStop: Instance[] = [];
    for (const instance of instances) {
      if (([InstanceStateName.pending, InstanceStateName.running] as unknown[]).includes(instance.State?.Name)) {
        instancesToStop.push(instance);

        if (instancesToStop.length > instancesToStopCount) {
          // evict the oldest
          let oldestIdx = -1;
          let oldestTime = Number.MAX_SAFE_INTEGER;
          instancesToStop.forEach((stoppingCandidate, idx) => {
            const launch = stoppingCandidate.LaunchTime!.valueOf();
            if (launch < oldestTime) {
              oldestIdx = idx;
              oldestTime = launch;
            }
          });
          instancesToStop.splice(oldestIdx, 1);
        }
      }
    }

const InstanceIds = instancesToStop.map((instance) => instance.InstanceId!);

    if (InstanceIds.length) {
      if (config.ec2InterventionMode === 'stop') {
        logger.debug(`Attempting to stop instances.`, { InstanceIds });
        await ec2.send(new StopInstancesCommand({ InstanceIds }));
      } else if (config.ec2InterventionMode === 'terminate') {
        logger.debug(`Attempting to terminate instances.`, { InstanceIds });
        await ec2.send(new TerminateInstancesCommand({ InstanceIds }));
      }
    } else {
      logger.debug(`No instances eligible to be stopped.`);
    }
} else {
    logger.debug(`Nothing to do, waiting for instances to stop.`);
}

logger.removeKeys(Object.keys(commonLogFields));
}

This dual approach of permission management and resource termination provides a robust method for enforcing resource limits.

IAM-based interventions prevent new resource creation when limits are reached.
Resource termination helps bring usage back under thresholds quickly.

Conclusion

The Outposts resource sharing prototype demonstrates how AWS is committed to addressing the evolving needs of public sector organizations. By enhancing monitoring, control, and intervention capabilities, this solution allows the more efficient use of Outposts resources while maintaining the security and compliance benefits critical to government agencies, educational institutions, and healthcare providers.

Jisc cloud solutions, an AWS Consulting Partner and an AWS public sector Solutions Provider, is already underway implementing this solution to serve the education and research sector, demonstrating the practical application and value of these enhanced resource sharing capabilities in real-world scenarios.

If this resource sharing solution is the right fit for your organization, take the next step to optimize your Outposts experience by reaching out to your AWS solutions architect or contact AWS Support to discuss how this prototype can address your specific needs and explore implementation options.

Select your cookie preferences

AWS Public Sector Blog