AWS Machine Learning Blog
Amazon Personalize improvements reduce model training time by up to 40% and latency for generating recommendations by up to 30%
This blog post was last reviewed and updated April, 2022 with database schema updates.
We’re excited to announce new efficiency improvements for Amazon Personalize. These improvements decrease the time required to train solutions (the machine learning models trained with your data) by up to 40% and reduce the latency for generating real-time recommendations by up to 30%.
Amazon Personalize enables you to build applications with the same machine learning (ML) technology used by Amazon.com for real-time personalized recommendations—no ML expertise required. Amazon Personalize provisions the necessary infrastructure and manages the entire ML pipeline, including processing the data, identifying features, using the best algorithms, and training, optimizing, and hosting the models.
When serving recommendations, minimizing the time your system takes to generate and serve a recommendation improves conversion. A 2017 Akamai study shows that every 100-millisecond delay in website load time can hurt conversion rates by 7%.[1] All other things being equal, lower latency is better. Our efficiency improvements have generated latency reductions of up to 30% for user recommendations across the full range of item catalogs supported in Amazon Personalize.
As your datasets grow and your users’ behavior changes, regular retraining is needed to keep your recommendations relevant. Solution training is one of the three cost drivers when using Amazon Personalize and can be a significant portion of your overall cost of ownership for Amazon Personalize. Improved training efficiency in Amazon Personalize reduces the cost of training solutions and increases the speed at which you can deploy new recommendation solutions for your users. New solution versions ensure that your Amazon Personalize model includes the most recent user events and that new items in your catalog are included in your personalized recommendations. The relative popularity of items changes as user preferences shift and when your catalog changes. Now, you can maintain the relevance of your recommendations at a lower cost and in less time.
The following sections walk you through how to use Amazon Personalize.
Creating dataset groups and datasets
When you get started with Amazon Personalize, the first step is to create a dataset group and import data about your users, your item catalog, and your users’ interaction history with those items. Each dataset group contains three distinct datasets: user-item interaction data, item, data, and user data. If you don’t have historical data or if you want to ensure you generate the most relevant recommendations based on in-session behavior, real-time user-item interactions (events) can be recorded using the putEvents
API. New items and user records can be added incrementally to your item and user datasets using the putItems
and putUsers
APIs, allowing you to update not only your model’s recent user actions but also ensure the most current item and user data is available when updating or retraining your solutions.
Creating an interaction dataset
Use the Amazon Personalize console to create an interaction dataset, with the following schema and import the file bandits-demo-interactions.csv, which is a synthetic movie rating dataset:
Creating an item dataset
You follow similar steps to create an item dataset and import your data using bandits-demo-items.csv, which has metadata for each movie. We use an optional reserved keyword CREATION_TIMESTAMP
for the item dataset, which helps Amazon Personalize compute the age of the item and adjust recommendations accordingly.
If you don’t provide the CREATION_TIMESTAMP
, the model infers this information from the interaction dataset and uses the timestamp of the item’s earliest interaction as its corresponding release date. If an item doesn’t have an interaction, its release date is set as the timestamp of the latest interaction in the training set and it is considered a new item with age 0.
Our dataset for this post has 1,931 movies, of which 191 have a creation timestamp marked as the latest timestamp in the interaction dataset. These newest 191 items are considered cold items and have a label number higher than 1800 in the dataset.
Create your dataset and import the data with the following item dataset schema:
Training a model
After the dataset import jobs are complete, you’re ready to train a model.
- On the Amazon Personalize console, in the navigation pane, choose Solutions.
- Choose Create solution.
- For Solution name, enter your name.
- For Recipe, choose aws-user-personalization.
This recipe combines deep learning models (RNNs) with bandits to provide you more accurate user modeling (high relevance) while also allowing for effective exploration of new items.
- Leave the Solution configuration section at its default values and choose Next.
- On the Create solution version page, choose Finish to start training.
When the training is complete, you can navigate to the Solution Version Overview page to see the offline metrics.
Creating a campaign
In this step, you create a campaign using the solution created in the previous step.
- On the Amazon Personalize console, choose Campaigns.
- Choose Create Campaign.
- For Campaign name, enter a name.
- For Solution, choose user-personalization-solution.
- For Solution version ID, choose the solution version that uses the
aws-user-personalization
recipe.
Retraining and updating campaigns
To update a model (solutionVersion
), you can call the createSolutionVersion
API with trainingMode
set to UPDATE
. This updates the model with the latest item information for the item in the dataset used to train the solution previously and adjusts the exploration according to implicit feedback from the users. This is not equivalent to training a model, which you can do by setting trainingMode
to FULL
. Full training should be done less frequently, typically one time every 1–5 days depending on your use case.
When the new solutionVersion
is created, you can update the campaign using the UpdateCampaign
API or on the Amazon Personalize console to get recommendations using it.
Conclusion
Product and content recommendations are only one part of an overarching personalization experience. End-to-end latency budgets require fast responses, and unnecessary latency decreases the impact and value of personalization for your users and business. The reduced latency of recommendations generated by Amazon Personalize has improved the speed at which you can generate recommendations for your users. Additionally, the improved efficiency of training Amazon Personalize ensures that your recommendations maintain relevance at a lower cost. For more information about training and deploying personalized recommendations for your users with Amazon Personalize, see What Is Amazon Personalize?
[1] https://www.akamai.com/us/en/multimedia/documents/report/akamai-state-of-online-retail-performance-2017-holiday.pdf
About the Authors
Deepesh Nathani is a Software Engineer with Amazon Personalize focused on building the next generation recommender systems. He is a Computer Science graduate from New York University. Outside of work he enjoys water sports and watching movies.
Venkatesh Sreenivas is a Senior Software Engineer at Amazon Personalize and works on building distributed data science pipelines at scale. In his spare time, he enjoys hiking and exploring new technologies.
Matt Chwastek is a Senior Product Manager for Amazon Personalize. He focuses on delivering products that make it easier to build and use machine learning solutions. In his spare time, he enjoys reading and photography.