AWS Machine Learning Blog

Prevent fake account sign-ups in real time with AI using Amazon Fraud Detector

Implementing an effective fraud prevention system is one of the top priorities for businesses that operate online web or mobile platforms. Businesses report millions of dollars of lost revenue each year due to fraud. Platform abuse and fraud prevention largely remain reactive, and is achieved by studying the profile behavior and transaction history of a user after they sign up. This approach is often manual, time-consuming, and expensive. Early detection and prevention of fraudulent account sign-ups on online platforms using artificial intelligence (AI) is an effective defense mechanism for combating fraud and abuse.

This post shows how you can use Amazon Fraud Detector in real time along with Amazon Cognito custom authentication workflows to prevent fake account sign-ups. Amazon Fraud Detector is a fully managed service that can identify potentially fraudulent online activities, such as creation of fake accounts or online payment fraud. Plus, you can use it without the need for any prior machine learning (ML) expertise. Unlike general-purpose ML packages, Amazon Fraud Detector is designed specifically to detect fraud.

Amazon Cognito lets you add user sign-up, sign-in, and access control to your web and mobile applications quickly and easily. It’s serverless, and can scale up to millions of users. I also discuss how you can use Amazon Pinpoint to track user sign-up flow events via user journeys and categorize users into segments. This is useful for user profiles and activity analysis in order to run effective marketing or promotional campaigns while maintaining a quality user experience.

Solution overview

In its general design, the solution uses an Amazon Fraud Detector supervised ML model along with a customized Amazon Cognito sign-up workflow to implement a real-time new user fraud prevention mechanism for online web and mobile applications. It also uses Amazon DynamoDB and AWS Lambda to customize the Amazon Cognito sign-up workflow. The following diagram illustrates the high-level architecture.

High-level architecture diagram of real-time fraud prevention using Amazon Fraud Detector and Amazon Cognito

Using Amazon Fraud Detector Online Fraud Insights

Amazon Fraud Detector Online Fraud Insights is a supervised ML model designed to detect a variety of online fraud. You can use Online Fraud Insights to detect fraudulent accounts during the sign-up process. The model generates a model score between 0 and 1,000. The higher the score, the higher the risk of the new account being fraudulent.

Because it’s a supervised ML model, your model accuracy may vary depending on the quality and maturity of the labeled training data. The model requires at least two features in the training dataset along with two required features: EVENT_TIMESTAMP and EVENT_LABEL. Using more features may help achieve higher model accuracy and lower false positive rates. Amazon Fraud Detector provides information on the importance of the features used in training the model, which is useful for addressing model overfitting or underfitting. The training dataset can be prepared with data from an existing fraud prevention system by following the data preparation guidance. In this case, the Amazon Fraud Detector model is trained with a labeled dataset with the following features.

Feature Description
ip_address User’s public IP address
email_address User’s email address
user_agent The User-Agent request header value
billing_state User’s postal address state
billing_postal User’s zip or postal code
billing_address User’s billing address
phone_number User’s phone number
EVENT_TIMESTAMP Required EVENT_TIMESTAMP variable
EVENT_LABEL The label (fraud or legitimate)

Amazon Fraud Detector also provides a way to define rules that tell the detector how to interpret the inference outcome. These rules can be defined using the rule language. A set of three specific rules is defined for this solution:

  • Low fraud risk – For a model score equal to or less than 650
  • Medium fraud risk – For a model score between 650 and 850
  • High fraud risk – For a model score over 850

You can define fewer or additional rules depending on the use case and the overall model accuracy. For the purposes of this solution, I defined three distinct user sign-up flows depending on which rule the model score outcome conforms to:

  • For low fraud risk evaluation outcomes, users can complete the registration process successfully.
  • For medium fraud risk evaluation outcomes, we want to introduce additional friction in the registration process. This involves a human identity verification step—a verification challenge code sent to their email, and (optionally) solving a CAPTCHA.
  • For high fraud risk evaluation outcomes, we want to prevent the user from registering in our application, capture all available data, and optionally alert an administrator.

Attack vector considerations

Fraud attack vectors are a mechanism by which bad actors obtain fraudulent access to an application in order to exploit the system. The most common fraud attack vector is sign-up attempts by users using synthetic identities, such as use of disposable emails or email tumbling. These methods involve using unique email addresses for every sign-up attempt. Fraudulent sign-up attempts are either carried out by individual users, group of users, or automated systems (bots). Another sophisticated form of fraud attack vector involves collusive behavior, also known as collusion fraud. In this scenario, a group of users gain access to the system and perform transactions in coordination with each other to game the system to their advantage.

Disposable email address domains can be identified by maintaining a list of known disposable email address domains in a DynamoDB table, and validating the email address against that list. Fraud graphs with Amazon Neptune provide a way to identify email tumbling and collusion fraud. Neptune is a fast, reliable, and fully managed graph database that can store fraud graphs and find relationships between the new user and existing users. With fraud graphs, you can use commonalities between user profiles such as the same postal address, phone numbers, and IP addresses to detect email tumbling or collusion fraud attempts. The following diagram shows an example of this process.

Validating if the email address uses a disposable email address and to detect email tumbling or collusion fraud using Amazon Neptune graph database identity graph

Custom Amazon Cognito user pool workflow

Amazon Cognito manages user sign-up and sign-in through a user directory known as a user pool. User pools let you customize authentication workflows using Lambda triggers. To customize a user pool workflow, you can create Lambda functions that are invoked by Amazon Cognito during various phases of the workflow. These functions can implement functionalities such as introducing authentication challenges, validating emails, sending confirmation messages, and other custom logic.

This solution uses Amazon Cognito pre sign-up Lambda trigger to implement a real-time fraud detection system. The Lambda trigger is invoked before Amazon Cognito performs a new user sign-up, which lets us run validations, and stores the user information and Amazon Fraud Detector rule outcome in a DynamoDB table. Because the function lets us run custom logic, we can also include validation of disposable emails or tumbling email addresses and subsequently assess the risk level of the user based on the rule outcome. The pre sign-up Lambda trigger lets us determine if the sign-up process should proceed normally, if additional validation steps (friction) should be introduced, or if the sign-up request should be denied.

The following diagram illustrates the logical flow of this function.

Logical flow of validations in an Amazon Cognito pre sign-up Lambda function for fraud prevention to filter disposable and tumbling email addresses and assess risk score using Amazon Fraud Detector

User segmentations and journeys using Amazon Pinpoint

Amazon Pinpoint enables businesses to communicate with their customers using popular channels like email, SMS, voice, and push notifications. With Amazon Pinpoint, you can also create segments of marketing campaign audiences. Without early fraud prevention for sign-ups, businesses must analyze all user profiles with the same lens. Findings of such analyses are then used to create appropriate audience segments for new user marketing or promotional campaigns. This approach often introduces overhead that takes time away from effectively engaging with customers, especially when dealing with large volumes of user data. For example, businesses may want to run marketing and promotional campaigns for new users with low sign-up risk scores.

Events within the Amazon Cognito sign-up flow can also be sent to Amazon Pinpoint so businesses can create customer journeys. An Amazon Pinpoint journey, as illustrated in the following diagram, is a multi-step engagement experience that can be tailored to fit the overall marketing strategy of the business.

Segments of users by their sign-up risk scores. It also shows the user sign-up event journey that can be set up in Amazon Pinpoint to drive additional functionality such as running effective marketing campaigns for trusted users with low sign-up risk scores

Model retraining

Online web and mobile platforms may evolve based on changing business needs. Businesses may expand to new geographic locations, letting users sign up from uniquely different email domains and IP addresses. The online platform may start letting users sign up using their phone numbers. In such cases, it becomes important that the Online Fraud Insights model is retrained with a more recent dataset in order to minimize biased prediction outcomes.

You can retrain a new version of the Amazon Fraud Detector model by using the data captured in DynamoDB. Data from the DynamoDB table can be exported to Amazon Simple Storage Service (Amazon S3) using DynamoDB table export. The data in Amazon S3 can then be formatted using the data preparation guidance for Amazon Fraud Detector training data. When the retraining data is ready, a new Amazon Fraud Detector model version can be trained.

Architecture overview

To demonstrate the solution, we trained an Amazon Fraud Detector model using a fictitious, synthetically generated sample dataset. We used an Amazon Cognito user pool custom authentication workflow to define the three different flows based on each of the Amazon Fraud Detector rule outcomes.

Low and high fraud risk sign-up flows

The following diagram shows the sign-up flow events. The Amazon Fraud Detector Online Fraud Insights ML model evaluates either a low risk or high risk outcome for the new user.

Registration flow architecture when low fraud risk or high fraud risk outcomes are detected by Amazon Fraud Detector, using Amazon Cognito and a pre sign up AWS Lambda function

Let’s walk through the flow:

  1. The user initiates a sign-up flow from the client application (web or mobile) by entering information such as name, email, postal address, phone, and desired password.
  2. The client invokes the Amazon Cognito user pool SignUp API by passing all the registration information along with the user’s public IP address and the client application’s User-Agent value.
  3. The client also sends the sign-up event to Amazon Pinpoint through the update-endpoint API.
  4. Amazon Cognito invokes the pre sign-up Lambda trigger with the user registration information, which includes all the variables needed for Amazon Fraud Detector to evaluate the user information.
  5. The Lambda trigger checks the email address against a predefined list of disposable email domains, and checks the email pattern for a tumbling email. If either of these validations are true, it responds with an error back to Amazon Cognito, which stops the sign-up flow. The client application can display an appropriate message.
  6. If the email isn’t disposable or a tumbling email, the Lambda trigger makes a call to the Amazon Fraud Detector GetEventPrediction API with all the required variables. Amazon Fraud Detector then responds back with the rule evaluation outcome and score that it used to evaluate the outcome. The outcome and score along with all other user attributes are stored in a DynamoDB table.
  7. Next, the outcome value is used to decide whether to permit the sign-up or not.
    1. If the outcome is low risk, the Lambda function sets the autoConfirmUser parameter to true. Amazon Cognito automatically confirms the user, and the user is registered.
    2. If the outcome is high risk, Lambda throws an error and Amazon Cognito denies the user sign-up.
  8. Based on responses from Amazon Cognito, the client shows an appropriate message and sends a successful sign-up or a sign-up denied event to Amazon Pinpoint.

Medium fraud risk sign-up flow

The following diagram shows the sign-up flow events where the Online Fraud Insights ML model evaluates a medium risk outcome for the new user. In this case, friction is introduced in the sign-up flow by means of additional identity verification.

Registration flow architecture when medium fraud risk outcome Is detected by Amazon Fraud Detector using Amazon Cognito pre sign-up AWS Lambda function, Amazon API Gateway, AWS Lambda, and Amazon DynamoDB

To do a walkthrough of this flow, let’s assume that the new user sign-up has passed the disposable and tumbling email validation checks in the pre sign-up Lambda trigger.

  1. The Amazon Cognito Lambda trigger receives a medium risk outcome and score from Amazon Fraud Detector and stores this, along with all other user attributes, in the DynamoDB table.
  2. The Lambda trigger sets the autoConfirmUser parameter to false. Amazon Cognito automatically sends a verification code to the user’s email address. Note that Amazon Cognito can also send a verification code to user’s phone number via SMS.
  3. The client application prompts the user to enter a verification code and (optionally) solve a CAPTCHA (implemented separately).
  4. The user enters the verification code to verify their identity. This identity verification step involves consecutive API calls.
    1. The first call is to Amazon Pinpoint through the update-endpoint API that an identity verification step has occurred.
    2. Next, a call is made to an Amazon API Gateway endpoint, which is backed by a Lambda function. This function validates if the client’s public IP address or User-Agent has changed. For example, a user may have switched networks or changed browsers. If the function detects changes, it makes an additional GetEventPrediction call to get the new risk outcome and score.
  5. If the second prediction outcome and score are in the same range or better—that is, medium or low risk—the Lambda function sends an OK response to the client via the API Gateway endpoint.
  6. Next, the client sends the verification code to Amazon Cognito via the ConfirmSignup API.
  7. Amazon Cognito confirms the user registration if the verification code entered by the user is valid.
  8. If the second prediction outcome changes to high risk, the Lambda function sends an error code to the client application via the API Gateway endpoint.
  9. The client stops the sign-up flow and displays a message to the user.

Deployment prerequisites

The starter code for setting up this real-time sign-up flow using Amazon Cognito and the Amazon Fraud Detector GetEventPrediction API is available on GitHub. For this walkthrough, you must have the following prerequisites:

  • An AWS account
  • Access to an AWS account with administrator or power user (or equivalent) AWS Identity and Access Management (IAM) role policies attached with permissions for Amazon Fraud Detector, Amazon Cognito, Lambda, DynamoDB, API Gateway, and Amazon Pinpoint.

Set up Amazon Fraud Detector

To get started with setting up and testing Amazon Fraud Detector, complete the following steps:

  1. Build an Amazon Fraud Detector model – upload the training data, create events to evaluate fraud, and train and deploy the model.
  2. Create a detector to generate real-time fraud predictions – add the model to the detector, and create and configure rules.

Set up an Amazon Cognito custom authentication workflow

Detailed step-by-step instructions on how to deploy the custom sign-up workflow are available in the GitHub repository. The repository consists of an AWS Cloud Development Kit (AWS CDK) application that deploys all the necessary AWS resources. The high-level steps are as follows:

  1. Create a Lambda function required to customize the Amazon Cognito user pool authentication workflow.
  2. Create an Amazon Cognito user pool and assign the Lambda function as the pre sign-up Lambda trigger.
  3. Create a DynamoDB table, Lambda function, and API Gateway endpoints for the identity verification step.
  4. Create an Amazon Pinpoint project.

You can use Amazon Cognito APIs via the AWS SDK (available for JavaScript, Java, .NET) and use API Gateway endpoints as REST endpoints to configure the sign-up or registration flow in your web or mobile app. Alternatively, you can use the AWS Amplify SDK Auth, API, and Analytics modules to integrate Amazon Cognito, API Gateway, and Amazon Pinpoint with your application.

Clean up

To avoid incurring future charges, delete the resources created for the solution.

  1. Follow the instructions provided in the GitHub repository to clean up resources created by the AWS CDK application.
  2. On the Amazon Fraud Detector console, manually delete all related resources.

Conclusion

This post demonstrated how you can implement a real-time fraud prevention system by preventing fake account creation with AI using Amazon Fraud Detector. I discussed how to mitigate different fraud attack vectors by customizing authentication workflows in Amazon Cognito using Lambda functions. This solution helps businesses take steps towards building an AI-powered fraud prevention system for their web and mobile platforms. Fully managed AWS services such as Amazon Fraud Detector, Amazon Cognito, and Amazon Pinpoint help make the solution cost-effective by reducing operational overhead. This solution is also customizable to support mitigation of emerging fraud attack vectors. Early fraud prevention helps reduce the time businesses spend analyzing user behavior to identify fraud in their platforms and focus more on driving business value. To learn more about how Amazon Fraud Detector can help your business, visit the webpage!


About the Author

Anjan Biswas

Anjan Biswas is a Senior Solutions Architect with focus on AI/ML, Data Analytics, and enterprise applications. Anjan works with enterprise customers and is passionate about developing, deploying and explaining AI/ML, Data Analytics, and Big Data solutions. Anjan has over 14 years of experience working with global supply chain, manufacturing, and retail organizations and is actively helping customers get started and scale on AWS.