AWS Public Sector Blog
Eutelsat increases service availability by migrating to AWS
Introduction
Eutelsat Group, the world’s first satellite operator to provide an integrated geosynchronous equatorial and low-Earth orbit (GEO-LEO) infrastructure, recently migrated their existing on-premises commercial Hadoop cluster to Amazon Web Services (AWS). By leveraging AWS reliability, scalability, and security, Eutelsat helps customers analyse data in real time with enhanced system availability across three AWS Availability Zones. As a result, Eutelsat reduced licensing costs by 50 percent, increased service availability to more than 99.8 percent, and decreased incidents.
Original system
Launched in January 2020, Eutelsat Konnect is a high-capacity satellite which delivers broadband services across Europe and Africa to help bridge the digital divide in areas where terrestrial networks are not deployed or offer poor performance. Powered by a global fleet of 36 geostationary satellites and associated ground infrastructure, Eutelsat provides 75 gigabits per second (Gbps) of capacity across a network of 65 different areas of coverage, enabling customers to reliably provide video, data, and fixed and mobile broadband services. The following maps show Konnect’s area of coverage in Africa and Europe.
To keep pace with consumer demand for over-the-top (OTT) streaming services, Eutelsat is growing connectivity at a high rate. Demands on handling and managing data for connectivity services have increased from 100 gigabits (Gb) per day to multiple terabytes (TB) per day. Furthermore, the launch of four new satellites in 2022, puts more demands on Eutelsat data processing and analysis resources.
To address these needs, Eutelsat incorporated an on-premises commercial Hadoop cluster in its data center. This was implemented as an analytics system based on a MySQL database and SAP Data Services to analyze telemetry data from the Konnect satellite and other geostationary satellites. The cluster was comprised of 15 nodes, of which 6 were data nodes with 64 core CPUs, 256GB of RAM, and 12TB total storage with a standard replication factor of three each.
The cluster was in high availability but lacked a disaster recovery site, resulting in a potential single point of failure for satellite telemetry data analysis. This could have impacted business and analytics operations because restoring from backups would take a few days. Scaling the cluster horizontally and vertically was a complex task that required specialized skills and the tight coupling of storage and compute. Moreover, only historical data was analyzed in batch and lacked real-time components.
Eutelsat migrated the cluster to AWS to overcome the limitations of their existing data platform and provide a more flexible, scalable, and highly available system.
New system
The new Eutelsat Data Lakehouse system on AWS is deployed on Amazon Elastic Compute Cloud (Amazon EC2) instances in private subnets in a virtual private cloud (VPC) over three Availability Zones. The data is stored on Amazon Simple Storage Service (Amazon S3), providing high availability and durability.
The system is integrated with a third-party directory for single sign-on (SSO). Eutelsat’s data center is connected to AWS with two (in Turin, Italy and Paris, France) 1 Gbps AWS Direct Connect dedicated connections for high availability.
Figure 1 shows the target architecture implemented on AWS on three Availability Zones in the EU-Central1 region (Frankfurt). Compute capacity is provided by Amazon EC2 instances, Amazon Relational Database Service (Amazon RDS) is configured in high availability for Hadoop services such as the Hive Metastore.
Figure. 1. Target architecture implemented on AWS on three availability zones in the EU-Central1 region. Compute capacity is provided by Amazon EC2 instances, Amazon RDS is configured in high availability for Hadoop services such as the Hive Metastore. The connection to AWS from Eutelsat’s data center is implemented with AWS Direct Connect. Data is stored in HDFS on Amazon Elastic Block Store (Amazon EBS) attached to Amazon EC2 instances and Amazon Simple Storage Service (Amazon S3). Amazon S3 and Amazon DynamoDB are connected to the system with Amazon VPC endpoints.
The new system is comprised of the following components:
- YARN, Hive, and Spark
- Master nodes (3 x m5.8xlarge)
- Worker nodes (6 x r5d.8xlarge, 2 nodes per AZ)
- Edge nodes (2 x m5.2xlarge)
- NiFi (9 x m5.2xlarge, 3 nodes per AZ)
- ZooKeeper (3 x m5.xlarge)
- Kudu and Impala (6 x m5.2xlarge)
Decoupling storage and compute allows for the independent management and scaling of both components, providing great flexibility and the benefit to manage peak load without overprovisioning.
By moving to AWS, Eutelsat started analysing data in real time with Kudu and Impala. The outputs of real-time analysis are interpolated with data coming from other systems to eventually manage satellite settings.
Challenges and lessons learnt
The migration project was managed and executed together with Agile Lab, which Eutelsat selected based on Agile Lab’s ability to analyse its existing data landscape. Eutelsat also relied on Agile Lab to design and implement a migration strategy from their old platform to AWS, leveraging their extensive experience on similar projects.
Now, combined with AWS scale, speed, and flexibility, the resulting platform can handle cost-effective, real-time migration capabilities.
During the migration, the new and old system ran in parallel. NiFi FlowFiles were exported from the old cluster to the new one using templates and modifying PutHDFS with PutCDPObjectStore to effectively store the data on Amazon S3. Hive tables were migrated with a custom API that converts that into optimized row columnar (ORC) format with a Hive command on the old cluster and copies them to the new cluster with WebHDFS REST APIs.
To optimize data processing on Amazon S3, data write procedures were modified to avoid the generation of small files and enable data compaction. Configuration settings like replication factor were modified for HDFS and Kudu. Custom Ansible scripts were developed to automatically handle configuration scripts such as certificates management, DMS management and LDAP authentication management of Apache Knox endpoints.
Conclusion
By using AWS, the Eutelsat engineering team can focus on dataflows and data management tasks instead of infrastructure and system administration tasks.
The result is a 35 percent improvement in time going to market and increased business agility, enabling Eutelsat to quickly and securely scale the cluster up or down in hours instead of weeks based on computing needs. Additionally, the decoupling of data from the cluster to Amazon S3 enables the Eutelsat team to work without the risk of any impact on company data.
Learn more about data platforms migrations to AWS. Organizations of all sizes across all industries are transforming and delivering on their aerospace and satellite missions every day using AWS. Find out more about cloud for aerospace and satellite solutions so you can start your own AWS Cloud journey today.
Get inspired. Watch our AWS in Space story.
Read more about AWS for aerospace and satellite:
- Amazon and AWS to reimagine space station operations and logistics for Orbital Reef
- AWS in Orbit Ep1: Safe railways, healthy forests with LiveEO
- AWS in Orbit Ep2: Closing the digital divide with Kacific
- AWS in Orbit podcast: Leveraging generative AI to do more at the rugged space edge with AWS
- AWS selects 13 start-ups for the 2023 Space Accelerator
- Managing the world’s natural resources with earth observation
- AWS joins the Digital IF Interoperability (DIFI) Consortium
- How Natural Resources Canada migrated petabytes of geospatial data to the cloud
- How Satellogic and AWS are harnessing the power of space and cloud
Contributing Authors: Olga Shoraka, AWS; Miriam Puertos, AWS; Bethenie Hope, AWS; Mathilde Perdaems, AWS.