Amazon Web Services ブログ
[Event Report] Engineers Weaving Fast Retailing’s Digital Transformation: A Global Challenge (Part 2)
Amazon Web Services Japan G.K. hosted Fast Retailing Co., Ltd. on July 24, 2024 for its 4.5 hour lecture on the theme of “Engineers Weaving Fast Retailing’s Digital Transformation: A Global Challenge”. This blog post reports on the contents of the lecture, following up on the previous post.
The job titles are as of the day of event.
The AWS Infrastructure Supporting Fast Retailing’s IT
Shigeru Horikawa, Director, Infrastructure, Digital Business Transformation Services, Fast Retailing Co., Ltd.
Mr. Horikawa from the Infrastructure team spoke about Fast Retailing’s history and the current status of AWS cloud utilization, as well as the ingenuity involved in operating a large-scale cloud environment with a small team.
Fast Retailing needs to connect over 3,500 stores globally, support an ecommerce (EC) business in around 30 markets, and help sustain the entire supply chain, including warehouses and factories worldwide. To keep up with the rapid business expansion, instead of building data centers for each business, a cloud that can scale globally and procure infrastructure quickly became essential. A major reason for cloud adoption was the 2017 EC site downtime. Experiencing 3 days of downtime and facing issues like difficulty in on-premises capacity expansion and traffic control made them understand the need for cloud adoption.
Fast Retailing’s cloud journey began around 2012 with the use of AWS for some campaign sites. In 2013, a cloud adoption project was launched in parallel with another initiative to modernize the existing legacy core systems. From 2016 to 2017, large-scale migration (lift & shift) from on-premises to cloud was carried out through the “Ariake Project” Around 2021, global expansion was accelerated further and AWS utilization expanded even more.
The benefits of cloud adoption included:
- Scalability and capacity assurance: Easy scaling up and capacity expansion as needed
- Architecture modernization: Shift from legacy systems to modern architectures like microservices
- Securing engineers: Easier hiring of cloud engineers and improved acquisition of partners and developers
- Availability assurance: Establishment of cost-efficient backup sites across Availability Zones and regions for disaster recovery
- Infrastructure cost optimization: Adjust resources between peak and off-peak times to optimize infrastructure costs
Current AWS usage includes over 60 accounts, 15+ regions, 13,000+ VMs, 3,000+ databases, and 120,000+ containers operated by a team of around 20 members. Various measures are taken to manage this large-scale infrastructure with a small team. Main initiatives include:
- Architecture reviews: Three review boards, Commerce, Enterprise, and CTO, review projects from planning to operational requirements and security from diverse perspectives
- Automation and codification: Adoption of Infrastructure as Code (IaC), cost management automation, standardized automated AWS account creation, etc.
- Self-service: Application teams can independently build, test, and deploy code with full responsibility on their part.
- Cost management: Visualize cost trends by system using tag-based cost allocation for application teams to manage costs with full responsibility on their part.
- Unified system monitoring platform: Consolidated disparate monitoring platforms so all the members can centrally view metrics and application behavior, reducing troubleshooting time
- Security measures: Leverage AWS security solutions for integrated multi-account security management. Aggregate security information into a security account integrated with a SIEM platform to enable early vulnerability detection and response
Future initiatives include broader use of Kubernetes beyond ECS containers for more granular self-service parameter control, strengthening regional infrastructure teams to accelerate global expansion and achieve follow-the-sun operations, and building a global project promotion system.
Through various measures like automation, self-service, and centralized monitoring, they have achieved large-scale cloud operations with a small team. Mr. Horikawa shared how companies accelerating global expansion can efficiently operate large-scale cloud environments using AWS.
A Look Behind the Scenes of UNIQLO and GU’s E-Commerce Apps
Shunsuke Akimoto, Core Engineering, Digital Business Transformation Services, Fast Retailing Co., Ltd.
Daisuke Sano, Core Engineering, Digital Business Transformation Services, Fast Retailing Co., Ltd.
Next, members of the Core Engineering team gave us a behind-the-scenes look at UNIQLO and GU’s e-commerce (EC) apps. First, Mr. Akimoto talked about a case of significant performance improvements and cost reductions in the cart function of the EC site.
The cart function plays an important role when selecting and purchasing items on GU and UNIQLO apps and web services. The previous architecture had Application Load Balancer on top, with Amazon Elastic Container Service (Amazon ECS) below that, and the Amazon ECS API Service reading data from Amazon Aurora. However, with this configuration, the database became a bottleneck during traffic surges like popular item sales, causing response times to deteriorate significantly. There was also the issue that upgrading database instances increased costs, putting pressure on the infrastructure budget.
To solve these issues, they decided to adopt Redis, a NoSQL database, as the main database. By using Amazon ElastiCache, they achieved major performance improvements while ensuring high scalability.
The new architecture has the API Service directly reading and writing data to Amazon ElastiCache, with asynchronous data synchronization between Amazon ElastiCache and Amazon Aurora. This maximized the performance of Redis.
Implementation ingenuity included creation of their own locking mechanism using Lua scripts and efficient extraction of updated carts using Redis Sorted Sets. Data migration was done with zero downtime, gradually migrating data to Redis while continuing operations.
As a result, even with over double the traffic that previously caused issues, no performance degradation was observed. Response time improved significantly from over 10 seconds at p95 to 160ms, with potential for handling much larger traffic increases. Costs were also greatly reduced, with additional optimization through Amazon ElastiCache for Redis’ Data Tiering feature.
Ultimately, database costs were reduced by over 60% with scalability assured for the next 10 years. They demonstrated how effectively leveraging cloud services can achieve both performance improvements and cost reductions simultaneously.
Next, Mr. Sano talked about the search platform. Fast Retailing’s systems comprise interconnected microservices for each business function, with a data integration platform and API integration. The search platform is a microservice providing search functionality, utilizing inventory, pricing, product, and other information from other platforms. Main functions include keyword search, creating product lists for category pages, keyword suggestions, similar product listings, in-store inventory search, etc.
Amazon OpenSearch Service is currently used by some platforms including the search platform. Three main features of Amazon OpenSearch Service are:
- Enabling registration of large amounts of data with flexible structures
- Enabling keyword search and flexible queries
- Fully managed service that easily scales in/out based on request volume
Advantages of Amazon OpenSearch Service include high performance and scalability for read-heavy applications, and reducing development by having typical e-commerce search features built-in.
Key requirements for global digital commerce search are efficiently supporting multiple brands, countries/regions, and language capabilities. These key requirements included:
- Functionality: Since building language-specific parsing in-house is difficult and inefficient, having built-in features like syntax analysis and normalization is important
- Ease of deployment: Adopting architecture where data automatically flows into Amazon OpenSearch Service
- Scalability: Easy scaling in/out
- Operability: Developing efficient tools for registering words and synonyms
Future challenges include reducing operational costs of registering words and synonyms. Leveraging machine learning-based technologies (large language models, vector search, semantic search, etc.) are being considered to address future challenges. Other challenges are developing features to improve user experience beyond keyword search, and expanding to new regions in parallel with improving the search platform.
Application Development Using Amazon ECS to Achieve Execution Control of Data Integration Infrastructure
Kei Sakamoto, Integration Engineering, Digital Business Transformation Services, Fast Retailing Co., Ltd.
Hirohito Yoshimoto, Integration Engineering, Digital Business Transformation Services, Fast Retailing Co., Ltd.
Next, Mr. Sakamoto from the Integration Engineering Team introduced their data integration platform initiatives.
The Integration Engineering Teams’ responsibilities include improving system development efficiency, as well as the development, management, and operational excellence of the data integration platform at Fast Retailing. As a digital consumer retailing company investing in end-to-end processes, data integration and batch processing became complex at large-scale. This required streamlining data, preparing tools for data utilization, and achieving high-speed processing to handle increasing data volumes after multi-brand launch and multinational expansion.
They are addressing these challenges with two main concepts. First is preparing the batch processing framework, with a system that can control process dependencies and parallel distribution, organizing complex dependencies into more manageable forms. Second is preparing the data integration platform, building a cloud-native hub-and-spoke model platform to centrally manage all information and streamline data. Key features are easy routing control by definition, distributed deployment tailored to business domain characteristics, and covering various protocols and data formats.
Four key requirements for this platform are:
- Stability: High stability required as it supports the business core
- Performance: Ability to plan for performance and capacity to withstand business fluctuations
- Portability: Easy distributed deployment
- Integration with diverse applications: Seamless integration with various applications
Afterwards Mr. Yoshimoto introduced the specific architecture of the data integration platform, as well as challenges and ingenuity during development.
The data integration platform is designed as a hub-and-spoke model system to efficiently integrate data between business domains, currently operating as 10 services across 2 clouds and 6 regions. Its development history began in 2015 as a batch processing platform, gradually expanding functionality to evolve into a data integration platform. Containerization and infrastructure codification were implemented in 2019, with enhanced multi-cloud support in 2021.
Key considerations in development and operation include environmental reproducibility, scalability, ease of customization, cloud vendor independence, etc. Regarding task execution management using Amazon ECS, they independently implemented solutions to address issues like rate limits on API calls per time unit and difficulty with synchronous task status management.
Specifically, the team wraps worker applications with a wrapper to operate as part of the overall framework, implementing a mechanism to synchronously grasp task status via heartbeats and callbacks. This avoids frequent polling queries to the ECS cluster for status checks, enabling more timely application execution monitoring and management.
Three future challenges are:
- Further streamlining and self-service in deployment: Enabling business application teams to easily construct data integration mechanisms
- Enhancing integration with new services: Achieving stable integration with new digital reform services
- Promoting data utilization: Further integrating with data accumulation/analysis processes to create business value
In closing, the speakers introduced how this initiative aimed to contribute to the business by consolidating complex data integration and batch processing, appropriately leveraging cloud services, and packaging as a solution.
(To be continued in Part 3)
Original blog writers from AWS Japan:
Mariko Anan, SA, Retail
Yuto Miyoshi, SA, Retail