AWS for Industries
Your Telecom Cloud Journey on AWS: Part 1 – Establishing a Foundation
Introduction
This three-part series provides guidance on establishing a platform, or Telco Cloud, for Communications Service Providers (CSPs) migrating Telecommunications service provider (Telco) workloads to AWS. Sharing experiences from our Telco users’ journeys to the cloud, we cover the key operational and technical considerations when migrating complex telco workloads such as Core Networks, Radio Access Networks (RAN), and Voice systems to AWS. These posts can help improve decision making and the implementation of a well-architected platform to support your transformation journey. You can gain a perspective on:
- Your telecom cloud journey on AWS: Part 1 – Establishing a Foundation (this post)
- Your telecom cloud journey on AWS: Part 2 – A technical roadmap with AWS
- Your telecom cloud journey on AWS: Part 3 -Optimizing cloud operations on AWS for telecom excellence
Many CSPs have already started their journey to the public cloud by migrating less complex IT, operations support systems (OSS), and business support systems (BSS) applications, and they are now considering how to migrate Telco workloads to AWS. These posts show that Telco workloads have unique technical, operational, security and governance requirements, and they focus on how an AWS environment can be created or adapted to support them.
The journey to the public cloud
The vast majority of Telcos are in the process of transitioning away from their private clouds and virtualization environments to the public cloud. In most cases this has involved lifting and shifting such as standalone IT workloads, Commercial off the Shelf Packages (COTS), and custom OSS/BSS systems that have clear public cloud migration paths. The next phase is to move on-premises/edge network services from traditional network devices to open source software, Software-Defined Networking (SDN), and Network Function Virtualization (NFV) in the public cloud.
The following diagram illustrates the relatively linear journey of most IT organizations as they have transitioned from on-premises hardware platforms, such as mainframes and Unix, to virtualization, open source software, cloud adoption, and ultimately greater agility driven through cloud native architectures, managed services, DevOps, insights from data, and artificial intelligence/machine learning (AI/ML). Compare this with the more frenetic timeline of the Telco Network community, which has started later in their shift to open source and virtualization but is now trying to gain a competitive advantage and reduce operating costs with greater agility, automation, speed, and insights. However, it hasn’t gained the same “muscle memory” over time as the IT organizations.
Figure 1: Telco journey to public cloud
This series focuses on the next set of cloud challenges that need to be addressed to support testing, deploying, and running network devices, their control plane, and any associated OSS/BSS systems on AWS. Specifically, we look at how Telco network business units can gain the IT cloud “muscle memory” by using existing people, processes, and technologies. We provide guidance on accelerating migrations by reducing the friction in the transition from on-premises to public cloud through capabilities that support networking, security, resilience, performance, observability, and AI requirements along with adapting current ways of working to be successful in the cloud.
Establishing a Landing Zone
A Landing Zone is a well-architected, multi-account AWS environment that is a starting point from which you can deploy workloads and applications. It provides a baseline to get started with a multi-account architecture, identity and access management, governance, security, networking, and logging on AWS. The majority of CSPs who have started their public cloud journey have targeted less complex IT workloads and have already established a Landing Zone to support it.
Users are often faced with an important choice of whether to run IT and Telco workloads within the same AWS Organization/Landing Zone (Option 1) or to deploy separate Organizations/Landing Zones: one for Telco and one for IT (Option 2), as shown in the following figure.
Figure 2: Landing Zone Approach
The “Combined Landing Zone” approach has the benefit of using an existing platform or establishing a shared Landing Zone for Telco and IT workloads. Sharing operational processes and technical capabilities would mean that the platform would need to accommodate the unique requirements of Telco workloads. This approach can become complex if there are different technical requirements, operational teams, change processes, and service level expectations.
The “Separated Telco & IT Landing Zone” approach allows greater operational, change, security, and automation flexibility by splitting this into two Organizations. In addition to the management overhead of having multiple Organizations, there are also some cost implications to this approach (such as volumetric discounts for a single consolidated bill).
To align on an approach requires discussion and involvement from key stakeholders within the Telco, IT, and Operational teams in your business. Examples of some of the questions that should be considered are:
- Is the current Landing Zone well-architected and can it be scaled to new requirements?
- What are the regulatory requirements for the workloads? Do they differ significantly for Telco workloads?
- What are the change and release processes? Can this be aligned for Telco/IT businesses?
- What are the observability requirements and can they be shared between IT and Telco?
- Can the technical architecture of the Landing Zone, such as networking and domain name system (DNS), be catered within a single Landing Zone? If so, then what additional complexity does this add?
- What is the organizational risk of sharing a Landing Zone?
The next section covers the main capability differences that drive the decision of which approach should be selected.
Foundational capabilities
The AWS Cloud Foundations framework identifies 29 capabilities across six areas: Governance, Risk, and Compliance (GRC), Operations, Security, Finance, Infrastructure, and Business Continuity. These capabilities identify the key building blocks that enable you to deploy, operate and govern your workloads. Although most of the core components remain the same, Telco workloads require an evolution/addition to these capabilities within an existing platform or when establishing a new one.
We have mapped these capabilities into areas to show the differences. They are grouped into three buckets: usually the “Same”, “Possibly Different for Telco”, and “Different for Telco”, when comparing the requirements of IT and Telco workloads, as shown in the following diagram.
Figure 3: Telco and IT foundational capabilities mapping
This analysis shows that in most areas we are able to reuse the same practices that were used for IT workloads. However, in specific areas we can expect adjustments are needed to cater for Telco workloads. In the following we focus on the “Possibly Different for Telco” and “Different for Telco” categories:
Governance, Risk, and Compliance
- Governance: Telco workloads are likely to have different business and regulatory requirements. In addition, it’s expected that the operational model for Telco workloads differ.
- Service Onboarding: Telco workloads usually require cloud services that may not already be in use, for example, AWS Outposts and AWS Wavelength. CSP applications are normally purchased from an Independent Software Vendor (ISV) and not developed in-house like many IT workloads, which influences the onboarding requirements,
- Change Management: Change processes are normally different with enhanced SLAs and uptime requirements for Telco services such as critical voice systems.
Operations
- Observability: With a large amount of data being collected and analyzed in near real-time, some of which may have sensitive data (for example, subscriber data falling under regulatory requirements), the observability patterns are likely to be different for Telco workloads and correlated across a different set of third-party and AWS native tools.
Security
- Security and Incident Response: Security and incident response teams may be separated from IT with specific knowledge and skills of the technologies used in the Telco domain.
- Vulnerability and Threat Management: Similar to “Security and Incident Response”, the Telco domain often relies on ISV software, requiring joint processes to be in place to manage the patching and upgrades of infrastructure systems.
- Identity and Access Management: With numerous ISVs involved in deploying and operating Telco workloads, they must be given access to AWS accounts and workloads. The permissions management needs to be well designed, shifting from operational users having highly privileged access on-premises to least privileged in the Cloud.
Infrastructure
- Workload Isolation: Specific environments need to be created for Telco workloads, which achieve the isolation requirements through central policies (such as Service Control Policies (SCPs)).
- Network Security: The application of network security is expected to be different with enhancements to optimize latency and with a combination of ISV and AWS capabilities to achieve secure, high throughput and monitored communications.
- Network Connectivity: Telco services require dedicated, reliable connectivity as the majority of CSPs employ a hybrid connectivity model. The result is highly resilient, diverse WAN connections with complex segmentation between environments.
Business Continuity
- Support: The support model used may be different with specialist skills to identify and resolve issues on the Telco platform and applications.
Landing Zone sharing recommendations
In addition to the preceding capabilities, if the Landing Zone is shared between Telco and IT, then the following should also be considered:
Consider separation of foundational accounts
It is best practice in AWS to create foundational accounts that are used to manage resources across a multi-account Landing Zone. A good example is the “shared-network” account where services such as centralized network inspection, AWS Direct Connect, and AWS Transit Gateway are deployed and shared with the Landing Zone. We recommend separating the accounts used for Telco to minimize the impact of changes, increase change flexibility, and allow least privileged access.
Separate critical Landing Zone resources
Consideration should be given to the resources being used within the Landing Zone, as Telco workloads are critical infrastructure and usually have dedicated resources. For example, a normal pattern is to share redundant Direct Connect circuits across applications in a Landing Zone. However, for Telco, which needs maximum resilience, low latency and high bandwidth are recommended to have dedicated Direct Connect Circuits. This achieves not only technical benefits but operational benefits allowing different change domains.
Design platform CI/CD to support operational flexibility
Provisioning an AWS platform with automation and Infrastructure-as-Code (IaC) is a best practice, where users use pipelines to build, validate, and deploy changes. When working with Telco workloads we recommend having separate pipelines from a dedicated account to deliver changes into the environment, this makes sure of release, change, and test processes uniquely for that domain.
Summary
We have seen in this post that IT teams within a CSP have gained the experience in the public cloud with many network teams starting on the path to migrate more complex and critical network workloads. A key outcome from this journey is to gain the agility, cost, efficiency, and speed benefits of using open source and cloud native workloads. These workloads should run on a foundational Landing Zone, allowing workloads to be securely migrated to AWS. Here a key decision must be made regarding whether to reuse an existing IT Landing Zone or establish a new Landing Zone for Telco workloads. The capabilities needed by workloads in both these domains can be different and the technical, operational, and security complexity of merging them into a single Landing Zone should be carefully evaluated. There are CSPs successfully using both approaches on AWS, and the involvement of key stakeholders is needed to agree on the most suitable approach for your Network Transformation on AWS. Read part 2 and part 3 of this series.