AWS Database Blog
Build an ultra-low latency online feature store for real-time inferencing using Amazon ElastiCache for Redis
Over the last several years, the growth of Machine Learning (ML) has changed the paradigm of how a business operates at its core, forcing the conversation of how to tightly integrate ML into critical decision points for a user. ML can help businesses by improving customer interactions, boosting sales, and improving operating efficiency. This can be seen by patterns of investments, with expected investment in machine learning to grow to over $209B by 2029, a 38% year-over-year growth.
This growing popularity compounded with the acceleration of data generation– approximately 120 Zettabytes expected in 2023, a 51% increase over 2 years – highlights the need to process data at greater speeds and volumes, to enable faster decision-making. As this need continues to increase, the most suitable, cost-effective infrastructure is paramount to providing scalable deployment of ML functionality to users. This means customers need to focus on the ease of deployment of ML models, model monitoring, lifecycle management, and model governance. Each of these areas require significant operational investments from an organization to support production-level ML models.
In this post, we explore how our customers are building online feature stores on AWS with Amazon ElastiCache for Redis to support mission-critical ML use cases that require ultra-low latency. We will cover reference architecture including a sample use case based on a real-time loan approval application that makes online predictions based on a customer’s credit scoring model using features stored in an online feature store powered by ElastiCache for Redis.
Overview of feature stores
Feature stores are one of the most important pieces of infrastructure in the ML space that serve to support the ease of model deployment. They provide a singular source for available features for training and inferencing use cases. Feature stores operate as a centralized repository that can be used by a model to retrieve features in a standardized format, acting as a data transformation service to make these features ready to use.
Customers such as Amazon Music look to Amazon ElastiCache to serve this need because it provides a trusted, scalable, enterprise-grade infrastructure to host deployment of their ML models.
“Amazon Music Feature Repository (MFR) is a fully managed ML feature store utilized to store, share, and manage ML features to power personalization. MFR supports high throughput at low latency for online inference and is used repeatedly by dozens of teams across Amazon Music to maintain a consistent personalized experience. ElastiCache provides low latency storage and retrieval for 1 TB of ML features and is simple for the MFR team to manage for peak traffic. MFR with ElastiCache raises the bar for consuming teams with strict latency requirements by delivering batches of ML features at low latencies across North America, European Union and Far East regions (Music Search, Voice/Echo).”
–David Follmer, Software Development Manager, Amazon Music
There are two types of feature stores that are commonly used:
- Offline feature stores – These typically store and process historical data for model training and batch scoring at scale, and store data in systems such as Amazon Simple Storage Service (Amazon S3). They use features that take a significant amount of time to generate and are less latency sensitive.
- Online feature stores – These typically need features calculated extremely fast with low latency, often in the single-digit millisecond range. They require fast computations and access to data, often through in-memory datastores for real-time predictions. Low latency feature store examples include serving features to ad personalization, real-time delivery times, cargo loading and unloading, loan approvals, or anomaly detection such as credit card fraud.
ElastiCache for Redis excels as an online feature store because of its in-memory capabilities providing extreme performance needed for modern real-time applications.
ElastiCache as a low latency online feature store
ElastiCache for Redis is a fast in-memory data store that provides sub-millisecond latency to power real-time applications at scale. In-memory data stores such as Redis provide low latencies along with hundreds of thousands of reads per second per node. According to a Feast benchmark, Redis performs 4–10 times better than other datastores in feature store use cases.
Many AWS customers such as United Airlines already use ElastiCache as an ultra-low latency online feature store to support use cases such as personalized experiences, electronic cargo loading, and more.
“United Airlines, one of the largest airlines globally, relies on a centralized machine learning platform with a feature store at its core. The airline utilizes machine learning models to deliver personalized experiences to customers. With a significant customer base generating millions of daily requests and strict service level agreement (SLA) requirements for quick response times, selecting the right feature store was crucial.
Among various options, United Airlines found ElastiCache for Redis to be particularly suitable due to its ability to provide ultra-low latency for millions of customers. Since features need to be constantly available in-memory and have the latest values for accurate predictions by the ML models, they are stored in a feature store. ElastiCache verifies that these features are updated as frequently as possible, enabling accurate predictions.
United Airlines employs ElastiCache as its online feature store for serving real-time traffic. An advantage of using ElastiCache is its support for global data stores, allowing the ML model to serve traffic from across multiple AWS Regions. This capability provides a robust and fail-safe plan to serve customers and offer personalized recommendations, such as selecting the right destination or finding the appropriate products during the check-in process.
In conclusion, we recommend ElastiCache for real-time traffic scenarios where low latency is essential for serving data efficiently.”
–Guillermo Garcia, Senior Manager ML Engineering, United Airlines & Kumar Gaurav, Senior Solution Architect, United Airlines
Advantages of ElastiCache for Redis
Customers choose ElastiCache for its high performance, the fact that it’s fully managed, its high availability and reliability, its scalable architecture, and its security controls.
High performance
ElastiCache is a fully managed in-memory caching service that scales to millions of operations per second with sub-millisecond read and write response times, which is typically not possible with disk-based systems. This is further improved in ElastiCache for Redis 7 with support for enhanced I/O multiplexing, which delivers significant improvements to throughput and latency at scale. Enhanced I/O multiplexing provides 72% increased throughput (read and write operations per second) and up to 71% decreased P99 latency when compared to previous versions we use.
Customers such as Swiggy, a popular online food ordering and delivery platform in India, achieved success in building a highly scalable, performant feature store, serving millions of customers in an extremely low latency way.
“Swiggy faced the challenge of managing large amounts of feature data when building ML models. In this process, the data grows quickly to billions of records, with millions being actively retrieved during model inference, all while operating under low latency constraints.
Swiggy leveraged ElastiCache as the primary feature store for ML models by building an automated ingestion pipeline and online inferencing engine.
Swiggy has benefitted from ElastiCache primarily for low latency, multiple data structure support, and a highly scalable system (50 million queries per second). With great support from the ElastiCache team, Swiggy was able to manage the feature store in a cost-effective way.”
–Soumya Simanta, Vice President, Engineering and ML Platforms, Swiggy
Fully managed
All the management tasks such as hardware provisioning, software patching, setup, configuration, monitoring, failure recovery, and backups are handled by ElastiCache. ElastiCache continuously monitors your clusters to keep them up and running so that you can focus on higher-value application development.
Highly available and reliable
ElastiCache clusters are deployed across multiple Availability Zones with automatic failover mechanisms to detect primary node failures and promote replicas to become primaries with minimal impact to your application.
Scalable architecture
ElastiCache is designed to support online cluster resizing without downtime to scale your clusters in and out for up to 500 shards capable of supporting 310 TiB of in-memory data, or 982 TiB when using clusters with data tiering.
Security Controls
ElastiCache allows you to simplify your architecture while maintaining security boundaries and also take advantage of granular access control to manage groups using role-based access control. You can use AWS Identity and Access Management (IAM) to connect to ElastiCache using IAM identities. ElastiCache offers encryption in transit and at rest using a customer-managed keys stored in AWS Key Managed Service (AWS KMS)-and Redis AUTH for secure internode communications to help keep sensitive data such as personally identifiable information (PII) safe. Finally, ElastiCache for Redis supports compliance programs such as SOC 1, SOC 2, SOC 3, ISO, MTCS, C5, PCI-DSS, HIPAA, and FedRAMP.
Solution overview
For this post, we use an open-source feature store framework called Feast that enables the underlying infrastructure for loading features into an offline feature store and materializing features to online feature stores. This is used for model inferencing and accelerating deployment of production-grade feature stores. Feast helps ML environment teams productize their real-time models by making the collaboration between engineers and data scientists more efficient.
Feast internally manages two sets of feature stores: an offline feature store (such as Amazon Redshift) to store and process historical data for model training and batch scoring at scale, and an online store (such as Amazon ElastiCache for Redis) for real-time predictions.
Credit scoring is a mechanism used by creditors and lenders to assess customers’ risk and their likelihood to repay or default on a loan. For this use case, a real-time system accepts a loan request from a customer and respond within 10 milliseconds with a decision to accept or reject a loan. This real-time system makes online predictions based on a customer’s credit scoring model using features stored in an online feature store powered by ElastiCache for Redis.
The following diagram illustrates the solution architecture.
The following are the key components of the architecture used for this use case:
- Loan dataset – This contains historical loan data of current customers with a status indicating whether the customer defaulted on a loan.
- Amazon Simple Storage Service (Amazon S3) – This is the primary data source with credit history features and zip code features stored in an S3 bucket.
- Amazon Redshift as an offline feature store – Amazon Redshift is a cloud data warehouse that uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price performance at scale. Historical loan data itself is not sufficient for making predictions for new customers. For credit scoring model predictions, historical loan data must be joined with zip code and credit history features. In our architecture, Amazon Redshift is configured as an offline feature store by Feast to enrich the historical loan data with credit history and zip code features stored in Amazon S3 and queried through Amazon Redshift. This data is used for producing training datasets.
- ElastiCache for Redis as an online feature store – ElastiCache for Redis is an in-memory key-value data store that provides sub-millisecond latencies to power internet-scale real-time applications. In our use case, we use ElastiCache for Redis as an online feature store to serve features at ultra-fast response times for online inferencing and making real-time predictions using the credit scoring model.
- Feast – Feast is the central processing system of this architecture. The core function of Feast is to register feature definitions into the Feast registry, which can later be used for training and online inference. Feast is used during model training to enrich the data with features stored in offline feature store powered by Amazon Redshift. Finally, Feast is also used to materialize the features to an online feature store powered by ElastiCache for online inferencing.
We use the following Feast components:
- Feast registry – An object store used to store feature definitions that are registered with the feature store. Applications can discover feature data by interacting with the registry through the Feast SDK.
- Feast Python SDK/CLI – We use the Feast SDK to:
- Manage feature definitions with version control.
- Load feature values into the online store.
- Build and retrieve training datasets from the offline store.
- Retrieve online features.
- Feast materialization engine – The materialization engine component launches a process that loads data into the online store from the offline store.
Deploy the Feast infrastructure for our use case
First, we deploy Feast infrastructure on Amazon Elastic Compute Cloud (Amazon EC2) instance. As part of that, we install the Feast SDK and CLI using pip with AWS and Redis dependencies:
Next, we configure Feast for our use case.
Create a feature repository
You can write Feast configuration declaratively and stored as code in a central location called a feature repository. The feature repository defines what the desired state of a feature store should be. A feature repository is a directory that contains the configuration of the feature store and individual features. This configuration is written as code (Python or YAML).
Feast manages two important sets of configurations:
- A configuration about how to run Feast on your infrastructure using the feature_store.yaml file. This YAML file configures the key overall architecture of the feature store. The provider value of the YAML file sets the default offline and online stores.
- A collection of Python files containing feature declarations.
The simplest way to create a new feature repository to use feast init command:
feast init fs_project_repo
Creating a new Feast repository in /home/ec2-user/fs_project_repo
.
feast init creates a blank feature repository with the following configuration files:
- feature_store.yaml – Contains a sample setup configuring data sources and the online store
- example_repo.py – Contains sample feature definitions
Configure ElastiCache as an online feature store and Amazon Redshift as an offline feature store
To configure Feast to be able to read and write from ElastiCache as well as Amazon Redshift, we modify the feature_store.yaml
file as follows:
Manage feature definitions
The next step of the Feast deployment is to create a feature definition file for managing feature definitions. The following are key components of the feature definition file:
- Defined entities – Entities are a collection of related features that map to a specific domain. In our use case, entities could be zip codes or SSNs that map the credit history of users.
- Defined sources – A data source refers to the raw underlying data that users own (for example, in a table in Amazon Redshift). Feast doesn’t manage any of the raw underlying data but instead is in charge of loading this data and performing different operations on it to retrieve or serve features. In Feast, a data source is associated with a corresponding offline feature store. For our use case, we have defined Amazon Redshift as our offline feature store. Data sources for Amazon Redshift are either tables or views. These can be specified either by a table reference or a SQL query.
- Feature views – A feature view is a logical grouping of time-series feature data as defined in a data source. Feature views consist of a data source, zero or more entities, a name to uniquely identify the feature view in the project, a schema that specifies one or more features, and metadata such as tags and TTL that determines how far back Feast will scan for fetching historical datasets.
Feature views are used for the following:
- Generating training datasets by querying the data source of feature views in order to find historical feature values. A training dataset may contain features from multiple feature views.
- Materializing feature values into an online store. Feature views determine the schema in the online store.
- Retrieving features from the online store. Feature views provide schema definitions to look up features from the online store.
The following is a sample feature definition file on how we would define it for our use case. Here, we have defined two data sources, zipcode_features and credit_history, corresponding to the two tables in Amazon Redshift (which in turn are sourced from files in Amazon S3):
Register feature definitions and deploy your feature store
Now that we have defined our online and offline feature stores and feature definitions, the next step is to register the feature definitions along with the underlying infrastructure into a Feast registry.
The feast apply
command reads the feature definition file in the current directory for feature view and entity definitions, registers the objects, and deploys the infrastructure. It also sets up the online feature store on ElastiCache for Redis and the offline feature store on Amazon Redshift.
The feast apply
command also does the following:
- Stores entity and feature view definitions in a local file called
registry.db
- Sets up ElastiCache for Redis as an online feature store for serving zip code and credit history features.
- Enables data sources on Amazon Redshift for point-in-time enrichment historical loan data with feature data
The following code shows a sample output:
At this point, no data has been materialized to your online store. feast apply
simply registers the feature definitions with Feast and spins up any necessary infrastructure such as tables.
Generate training data
To train a model, we need features and labels. For our use case, we have a loans
table that stores historical data and another set of tables, zipcode_features
and Credit History
, with feature values.
Feast can help generate feature vectors that map features to labels. Feast needs a list of entities (for example, zip code and SSN) and timestamps. Feast will intelligently join relevant tables to create the relevant feature vectors.
First, we load the loan dataset that has features of historical loan data of current customers:
Get historic loan data
loans = pd.read_parquet("data/loan_table.parquet")
We are reading the loan dataset from a Parquet file stored in a local data directory. Ideally, these can be stored in an object store such as Amazon S3.
This dataset doesn’t contain all the features we need in order to make an accurate scoring prediction. We must enrich this dataset by joining our zip code and credit history features from an offline datastore accurately using point-in-time joins.
The first step in this process is to initiate a feature store object pointing to our feature store repository:
Then we identify the features we want to query from Feast. We do this by defining feature references (for example, credit_history:credit_card_due
) for the features that we want to retrieve from the offline store. These features can come from multiple feature tables. See the following example code:
We then query Feast to enrich our loan dataset. Feast will automatically detect the zipcode
and dob_ssn
join columns and intelligently join features that were available at the time the loan was active. This is done by creating an entity dataframe and launching historical data retrieval into the dataframe.
An entity dataframe is the target dataframe on which you would like to join feature values. The entity dataframe must contain a timestamp column called event_timestamp
and all entities (primary keys) necessary to join feature tables onto. The entities found in feature views that are being joined onto the entity dataframe must be found as a column on the entity dataframe.
After the feature references and an entity dataframe are defined, we run get_historical_features()
. This method runs a point-in-time join of features from the offline store onto the entity dataframe. When it’s complete, a job reference will be returned. This job reference can then be converted to a Pandas DataFrame by calling to_df()
which is later used in training in inferencing APIs. See the following code:
Train the credit scoring model
After we have retrieved the complete training dataset, we have a few more preprocessing steps to run on the dataset before it’s ready for training.
- Encode categorical features with the following code:
- Drop the entity and timestamp columns:
- Split the training dataframe into a train, validation, and test set.
- Finally, we can train our model classifier:
After our model is trained, it’s ready to be used for online inferencing. For online inferencing, we need a database that can provide ultra-fast response times at scale and is capable of generating high throughput. Offline feature stores will not be sufficient to meet these requirements and therefore aren’t used for online inferencing. Real-time predictions are often generated using online feature stores powered by databases that can deliver ultra-fast response times at scale.
Ingest batch features into the ElastiCache for Redis online feature store
Before we can make online loan predictions with our credit scoring model, we must populate our online store with feature values. Feast allows users to load their feature data into an online store in order to serve the latest features to models for online prediction through a process called materialization.
ElastiCache for Redis is great for feature stores not only because of its microsecond performance, but also its support for native data structures and its programming flexibility. Here, we can model features as hashes and customize the data format to our needs. When feature data is stored in Redis, we use it as a two-level map by utilizing Redis hashes:
- The first level of the map contains the Feast project name and entity key. The entity key is composed of entity names and values.
- The second level key (in Redis terminology, this is the field in a Redis hash) contains the feature table name and the feature name, and the Redis hash value contains the feature value.
The following diagram illustrates this setup.
To load feature data into your ElastiCache for Redis online store for online predictions, run the materialize-incremental
command. The materialize-incremental
command will query the offline store for the feature views over the provided time range and load the latest feature values into the configured online store up to the $CURRENT_TIME
. The materialize incremental
command can be run periodically as more data becomes available in order to keep the online store fresh. See the following code:
Read features from ElastiCache for online inferencing
Now we have everything we need to make a loan prediction. At inference time, we need to quickly read the latest feature values from the online feature store using get_online_features().
The following is the sample loan request object for submitting a loan application:
For online inferencing we do the following:
- Get features from an online feature store.
- Join these features with the loan request object.
- Run preprocessing steps such as applying encodings, sorting columns, and dropping any unnecessary columns.
- Finally, make the prediction.
See the following code:
We now have fully functional credit scoring system that is capable of making real-time predictions using credit models trained by Feast using feature stores powered by ElastiCache for Redis and Amazon Redshift.
Clean up your resources
To avoid unnecessary charges to your AWS account, please delete the resources you created for this post if you no longer need them. If deployed using AWS CloudFormation, you can delete the stack via the AWS CloudFormation console.
Conclusion
ElastiCache is a fully managed in-memory datastore provided by AWS and is a popular choice for building online feature stores for ML model inferencing. With ElastiCache, you can store and retrieve precomputed or frequently accessed features that are used as inputs for your ML models. As discussed in this post, there are several reasons why ElastiCache is a popular choice as an online feature store:
- High performance – ElastiCache for Redis is an in-memory data store, providing extremely fast read and write operations. This makes it ideal for real-time inferencing, where low latency is crucial to process predictions quickly.
- Data storage – You can use the flexible in-memory data structures of ElastiCache to store and retrieve model features efficiently. ElastiCache provides fast read and write operations, which is crucial for real-time inferencing. These include strings, hashes, lists, sets, and sorted sets, which enable flexible and optimized data modeling for ML use cases.
- Feature generation – You can preprocess and generate the features needed for model inferencing. These features can be extracted from various data sources like databases, APIs, or streaming data. Once generated, you can store them in ElastiCache for quick access.
- Scalability – ElastiCache is designed to scale horizontally, allowing you to handle increased workload demands. As your ML inferencing grows, you can add more nodes to the ElastiCache cluster to handle the increased feature storage and retrieval requirements.
ElastiCache, together with solutions like Amazon SageMaker Feature Store and Feast, provide comprehensive framework for feature versioning, feature management, or feature serving at scale. Amazon SageMaker is a fully managed service that provides a complete environment for building, training, and deploying ML models. It integrates well with other AWS services and allows you to easily deploy and scale your models for inferencing.
To get started with Amazon ElastiCache, please refer to our Getting started guide. Also, to learn how our customers are using Amazon ElastiCache at scale, please refer to ElastiCache customer use cases. For more prescriptive guidance, please reach out to your AWS Account team to schedule deep dive sessions with Amazon ElastiCache specialist team.
We have adapted the concepts from this post into a deployable solution, now available as Guidance for Ultra-Low Latency, Machine Learning Feature Stores on AWS in the AWS Solutions Library. To get started, review the architecture diagrams and the corresponding AWS Well-Architected framework, then deploy the sample code to implement the Guidance into your workloads.
About the authors
Siva Karuturi is a Worldwide Specialist Solutions Architect for In-Memory Databases based out of Dallas, TX. Siva specializes in various database technologies (both Relational & NoSQL) and has been helping customers implement complex architectures and providing leadership for In-Memory Database & analytics solutions including cloud computing, governance, security, architecture, high availability, disaster recovery and performance enhancements. Off work, he likes traveling and tasting various cuisines Anthony Bourdain style!
Sanjit Misra is a Senior Technical Product Manager on the Amazon ElastiCache and MemoryDB team based in Seattle, WA. For the last 15+ years, he has worked in product and engineering roles related to data, analytics, and AI/ML. He has a MBA from Duke University and a bachelor’s degree from the University of North Carolina – Chapel Hill. In his spare time, he’s a avid sports fan and loves to spend time outdoors with friends and family.
Smita Srivastava is Solution Architect at Amazon Web Services, assisting digital-native companies in cultivating innovation. With her experience, she guides companies in their growth journey and translates their ideas into reality, with a particular emphasis on AI/ML leveraging AWS services. Beyond her profession, she’s an avid traveler, a book enthusiast, and a culinary explorer.