This Guidance demonstrates an end-to-end, near real-time anti-fraud system based on deep learning graph neural networks. This blueprint architecture uses Deep Graph Library (DGL) to construct a heterogeneous graph from tabular data and train a Graph Neural Network (GNN) model to detect fraudulent transactions.
Architecture Diagram
-
Near Real-Time Fraud Detection
-
Offline Model Training
-
Near Real-Time Fraud Detection
-
Step 1
Use Amazon API Gateway to host HTTP APIs for near real-time fraud detection services.Step 2
Use AWS Lambda functions as an HTTP API backend. The functions process the new transactions as graph data then store them in a graph database such as Amazon Neptune.Step 3
Query the sub-graph of the requested transactions from Neptune.
Step 4
Use an Amazon SageMaker endpoint to predict the fraudulent possibility of transactions with pre-trained GNN models.
Step 5
Send the predicated results to Amazon Simple Queue Service (Amazon SQS) to be consumed by business analysis systems.
Step 6
Use Lambda functions to poll the predicated results from Amazon SQS, then store them in Amazon DocumentDB.
Step 7
Business analysts access the business dashboard, which uses Amazon CloudFront and Amazon Simple Storage Service (Amazon S3) to host a static website, and AWS AppSync and Lambda as a backend.Step 8
Use Lambda functions as an AWS AppSync resolver to fetch the data from Amazon DocumentDB.Step 9
CloudFront uses origin access identity (OAI) to securely access the static web files on Amazon S3. -
Offline Model Training
-
Step 1
System operations or a periodic system task initiates the model training workflow.Step 2
Use Lambda function to ingest the raw dataset to Amazon S3.Step 3
Query the sub-graph of the requested transactions from Neptune. Use AWS Glue crawler to crawl the raw dataset to populate the Data Catalog.
Step 4
Use AWS Glue extract, transform, load (ETL) job to transform the tabular dataset to a heterogeneous graph dataset, then save it to Amazon S3.
Step 5
Use the SageMaker training job to train the Graph Neural Network (GNN)-based fraud detection model with Deep Graph Library (DGL).
Step 6
Use AWS Fargate with Amazon Elastic Container Service (Amazon ECS) to load the graph dataset from Amazon S3 into fully-managed graph database service, Neptune.
Step 7
Use Lambda to package the GNN model and custom code as the model in SageMaker.Step 8
Create an endpoint configuration of SageMaker.
Step 9
Create or update an endpoint using the endpoint configuration in Sagemaker.
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
This Guidance uses AWS Serverless services like AWS Glue, SageMaker, AWS Fargate, Lambda as compute resources for processing data, training models, serving the API functionalities, and keeping billing to pay-as-you-go pricing. One of the data stores is designed using Amazon S3, providing a low total cost of ownership for storing and retrieving data. The business dashboard uses CloudFront, Amazon S3 and AWS AppSync, Lambda to implement the web application.
-
Security
API Gateway and Lambda provide a protection layer when invoking Lambda functions through an outbound API. All the proposed services support integration with AWS Identity and Access Management (IAM), which can be used to control access to resources and data. All traffic in the VPC between services are controlled by security groups.
-
Reliability
API Gateway, Lambda, AWS Step Functions, AWS Glue, Amazon S3, Neptune, Amazon DocumentDB, and AWS AppSync provide high availability within a Region. Customers can deploy SageMaker endpoints in a highly available manner.
-
Performance Efficiency
All the services used in the design provide cloud watch metrics that can be used to monitor individual components of the design. MLOps pipelines orchestrated by Step Functions helps to continuously iterate the model. API Gateway and Lambda allow publishing of new versions through an automated pipeline.
-
Cost Optimization
This Guidance requires GNN model training for fraud detection. The performance requirements for batch processing range from minutes to hours; AWS Glue and SageMaker training jobs are designed to meet them. Neptune is a purpose-built, high-performance graph database engine. Neptune efficiently stores and navigates graph data, and uses a scale-up, in-memory optimized architecture for fast query evaluation over large graphs. Provisioned concurrency in Lambda and the HTTP API in API Gateway can support a latency requirement of less than 10 ms.
-
Sustainability
This Guidance uses the scaling behaviors of Lambda and API Gateway to reduce over-provisioning resources. It uses AWS Managed Services to maximize resource utilization and to reduce the amount of energy needed to run a given workload.
Implementation Resources
A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.
The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.
Related Content
Build a GNN-based real-time fraud detection solution using Amazon SageMaker, Amazon Neptune, and the Deep Graph Library
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.