AWS Partner Network (APN) Blog
Modernize Legacy Batch Job Platform Using Event-Driven Cloud-Native Architecture
By Mittal Bhiogade, Chief IT Architect – Resolution Life
By Hitesh Bagchi, Chief Architect – Cognizant
By Pratip Bagchi, Sr. Partner Solutions Architect – AWS
Cognizant |
Large enterprise customers depend heavily on commercial off the shelf (COTS) products for scheduling jobs or running batch jobs. They play a crucial role in integrating multiple disparate applications by providing services to schedule, orchestrate, manage, and monitor batch jobs to enable process automation and improve efficiency.
During the digital transformation journey, some customers want to lift and shift those COTS products to avoid any changes in their enterprise ecosystem. Many customers take this opportunity to move to a modern architecture primarily to leave proprietary license locks and achieve optimum operational excellence to run predictable scheduled batch jobs.
In most cases, enterprises are usually running custom-built batch jobs built on top of a COTS platform. While custom batch jobs can undergo some migration to make them more cloud-friendly, COTS platform modernization poses a different kind of challenge.
These COTS products are often not cloud-native, which prevents them from taking advantage of the various cloud services. Though they can be deployed to hosted cloud infrastructures, using Amazon Elastic Compute Cloud (Amazon EC2) as compute and Amazon Relational Database Service (Amazon RDS) as a database platform, their use of cloud is limited to infrastructure as a service (IaaS).
In this post, we will describe how Resolution Life US (RLUS) collaborated with Cognizant and Amazon Web Services (AWS) to modernize its on-premises third-party enterprise job scheduler with an event-driven custom scheduler solution using Amazon FSx, AWS Step Functions, Amazon EventBridge, and AWS Lambda.
This modernized serverless solution approach helped Resolution Life US to achieve over 40% operational cost reduction by eliminating commercial license cost. It also enabled them to achieve complete end-to-end process automation with over 4X performance efficiency.
Resolution Life US is one of the reputed U.S. life insurance groups focusing on the acquisition and management of portfolios of life insurance policies. RLUS is a leading provider to life and annuity customers in the United States and delivers a data-driven, artificial intelligence (AI)-enabled and people-focused service. RLUS serves the needs of over 1.1 million policyholders while managing over US$ $29 billion of assets.
Cognizant is an AWS Premier Tier Services Partner with six AWS Competency designations. Cognizant is also a member of the AWS Managed Cloud Services Provider (MSP) and AWS Well-Architected Partner Programs.
Legacy Architecture and Challenges
The RLUS enterprise systems consist of several COTS applications which are the foundation of critical life insurance business processes. These jobs are used for reinsurance premium calculations, securities valuations, scenario generations, financial ledger management, and other standard life insurance processes.
In the existing system, the batch jobs of these applications were orchestrated using a commercial enterprise job scheduler and leveraging agents running collocated on the same virtual machine as the COTS jobs to manage and monitor the jobs across multiple servers.
Cognizant’s goal was to replace the enterprise scheduler with an event-driven custom solution using AWS serverless services. This would eliminate dependence on third-party software and enable the customer a pathway to get out of proprietary license locks.
In the current architecture, the data feeds from upstream source systems were dropped through managed file transfer (MFT) to a Windows shared drive. Batch schedulers were used to initiate the batch processes for each schedule depending on the source and trigger files. The batch job cycles were orchestrated based on batch type, input source files, and output trigger files which were generated by a preceding batch job.
Broadly, the complete batch processes consisted of the following components:
- Enterprise scheduler: Enterprise batch scheduler initiated the job cycles based on schedule and source files. It was responsible for orchestrating batch jobs, managing job dependencies, and maintaining the job execution statuses.
- COTS jobs: These long-running batch jobs are components of the COTS products and were responsible for processing the incoming source feeds and generated output feeds along with trigger files. These output feeds were consumed by batch jobs in different job cycles and/or by downstream applications.
- Custom jobs: These short-running batch jobs were developed using .NET and performed various housekeeping activities, such as converting EBCDIC input files to ASCII, merging multiple output files into one, and zipping multiple output files into a single bundle. The purpose of these custom batch jobs was to automate various job management tasks.
Modernized Architecture
This implementation is part of RLUS’s migration and modernization to AWS as part of its digital transformation strategy. This was driven by the organization’s enterprise mission and long-term vision to achieve operational excellence and cost efficiency using AWS as the platform of choice.
Enterprise batch schedulers typically implement a variety of batch scheduling and management functions. Cognizant implemented a subset of these functions relevant to deliver on this use case.
Salient features and highlights of the solution include:
- Adopts an agentless approach—unlike job schedulers, no job agents were deployed to the virtual machines executing the COTS jobs.
- Creates new jobs by developing AWS Lambda functions and adding new job entries to the job configuration table. No code change required to job framework.
- Handles multiple jobs running concurrently with externally configurable job dependencies and execution trigger. It can handle both time-based and file-based triggers.
- Captures and store job-level execution statistics for each job.
- Handles job failures through multiple retries and notifies IT support job statuses using its integrated notification mechanism.
- Has a generic and reusable notification mechanism to deliver notifications to multiple communication channels, including email, Teams, and SMS with the ability to plug in new channels on demand.
Figure 1 – Target solution architecture.
Execution Steps
- An input file is received by MFT.
- MFT pushes input files to the data lake built on Amazon Simple Storage Service (Amazon S3).
- Data lake pushes input file to Windows share drop zone:
- Windows file share generates a Windows event and publish it to Amazon CloudWatch logs.
- CloudWatch log stream has a subscription filter which triggers a Lambda function in the shared account.
- The Lambda function in the shared account processes the event and drops a custom event to Amazon FSx custom event bus.
- Event rule on Amazon FSx custom event bus is triggered by the custom event.
- Amazon EventBridge publishes the custom event to the production account’s COTS custom event bus.
- Job launcher Lambda triggered:
- Job launcher triggered by file event.
- Job launcher Lambda triggered by scheduled event.
- Job launcher Lambda function reads an Amazon DynamoDB table to retrieve job configuration metadata.
- Job launcher Lambda function triggers an AWS Step Function whose workflow orchestrates the batch jobs:
- AWS Step Function triggers a custom batch job Lambda function and that name is passed as an environment input to the Step Function.
- Lambda jobs generates trigger files into the Windows file share.
- Step Functions triggers the data lake file transfer Lambda function.
- Files are dropped into MFT via the data lake.
- Step Functions trigger job audit Lambda to capture the job execution steps
- Job execution steps are stored in a DynamoDB job audit table.
- Step Function sends job outcome to Lambda function (AWS Email Notifier).
- Lambda function (AWS Email Notifier) formats the email and sends to Amazon Simple Notification Service (SNS).
- SNS publishes notifications to a Microsoft Teams channel.
- MFT pushes files to destination systems.
The following AWS services were used in the solution architecture, along with the context in which they have been used:
- AWS Lambda triggers the Step Function which then triggers the job cycle for a given event. Such an event could be both file-based and scheduled. The job launcher Lambda function checks DynamoDB’s job configuration table to trigger the appropriate job cycle. Lambda is also used to implement the custom batch jobs that perform the housekeeping functions.
- Amazon FSx acts as the primary storage layer for all input, output, and trigger files for the COTS batch jobs. The solution leverages the file access auditing feature of FSx for Windows. Using this feature, the solution can trigger an event when reads/writes operations are executed on a directory in a Windows shared drive which has been configured to generate windows security audit events.
- Amazon Simple Notification Service (SNS) sends notifications to the IT support Teams channel via AWS Lambda.
- Amazon EC2 is used to deploy the COTS application. This is a mandatory requirement, and the applications can only be deployed to a physical or virtual host.
- Amazon RDS for SQL Server Standard 2019 is used as the target database to deploy the COTS applications. An instance of SQL Server is a prerequisite requirement to run these applications.
- Amazon CloudWatch Logs captures the Windows security audit log generated when read/write operations are executed on the FSx for Windows shared drive.
- Amazon EventBridge triggers the event-based on scheduled-based rules that triggers a Lambda function or publishes events across event bus in separate AWS accounts.
- AWS Step Functions orchestrates the Lambda functions.
- Amazon DynamoDB stores the job configurations metadata and job execution data.
The solution maximizes use of AWS serverless services to deliver a fully managed serverless event-driven service that is cost-optimized and scalable. It enables integrations of non-cloud-native COTS application in an event-driven enterprise workflow.
Customer Benefits
- Cost savings: Over 40% savings in operational costs by eliminating use of licensed products while achieving complete end-to-end process automation.
- Performance gain: Over 4X gain in performance through use of AWS serverless services that scale horizontally compared to the legacy on-premises monolith batch application.
- Process improvement: Complete end-to-end automation achieved for COTS batch jobs without needing COTS applications to support cloud natively.
- Reusable batch framework: A reusable enterprise-scale serverless batch framework which can be configured to support new COTS or custom-built batch jobs.
Conclusion
In this post, we demonstrated how Cognizant solved a customer use case by adopting cloud-native AWS services to implement an event-driven, robust, highly scalable, and highly performing architecture.
The solution worked backwards from the customer to modernize their legacy third-party proprietary COTS platform and custom batch jobs into a modern, event-driven cloud-native architecture.
These are some useful links to AWS documentation and blogs on related topics:
- File access auditing is now available for Amazon FSx for Windows File Server
- Implementing security notifications for end-user activity on Amazon FSx for Windows File Server
- AWS documentation: File access auditing
Cognizant – AWS Partner Spotlight
Cognizant is an AWS Premier Tier Services Partner and MSP that transforms customers’ business, operating, and technology models for the digital era by helping organizations envision, build, and run more innovative and efficient businesses.