AWS Machine Learning Blog
From fridge to table: Use Amazon Rekognition and Amazon Bedrock to generate recipes and combat food waste
In today’s fast-paced world, time is of the essence and even basic tasks like grocery shopping can feel rushed and challenging. Despite our best intentions to plan meals and shop accordingly, we often end up ordering takeout; leaving unused perishable items to spoil in the refrigerator. This seemingly small issue of wasted groceries, paired with the about-to-perish grocery supplies thrown away by grocery stores, contributes significantly to the global food waste problem. This demonstrates how we can help solve this problem by harnessing the power of generative AI on AWS.
By using computer vision capabilities through Amazon Rekognition and the content generation capabilities offered by foundation models (FMs) available through Amazon Bedrock, we developed a solution that will recommend recipes based on what you already have in your refrigerator and an inventory of about-to-expire items in local supermarkets, making sure that both food in your home and food in grocery stores are used, saving money and reducing waste.
In this post, we walk through how to build the FoodSavr solution (fictitious name used for the purposes of this post) using Amazon Rekognition Custom Labels to detect the ingredients and generate personalized recipes using Anthropic’s Claude 3.0 on Amazon Bedrock. We demonstrate an end-to-end architecture where a user can upload an image of their fridge, and using the ingredients found there (detected by Amazon Rekognition), the solution will give them a list of recipes (generated by Amazon Bedrock). The architecture also recognizes missing ingredients and provides the user with a list of nearby grocery stores.
Solution overview
The following reference architecture shows how you can use Amazon Bedrock, Amazon Rekognition, and other AWS services to implement the FoodSavr solution.
As shown in the preceding figure, the architecture includes the following steps:
- For an end-to-end solution, we recommend having a frontend where your users can upload images of items that they want detected and labeled. To learn more about frontend deployment on AWS, see Front-end Web & Mobile on AWS.
- The picture taken by the user is stored in an Amazon Simple Storage Service (Amazon S3) This S3 bucket should be configured with a lifecycle policy that deletes the image after use. To learn more about S3 lifecycle policies, see Managing your storage lifecycle.
- This architecture uses different AWS Lambda Lambda is a serverless AWS compute service that runs event driven code and automatically manages the compute resources. The first Lambda function,
DetectIngredients
harnesses the power of Amazon Rekognition by using the Boto3 Python API. Amazon Rekognition is a cutting-edge computer vision service that uses machine learning (ML) models to analyze the uploaded images. - We use Rekognition Custom Labels to train a model with a dataset of ingredients. You can adopt this architecture to use Rekognition Custom Labels with your own use case. With the aid of custom labels trained to recognize various ingredients, Amazon Rekognition identifies the items present in the images.
- The detected ingredient names are then securely stored in an Amazon DynamoDB (a fully managed NoSQL database service) table. for retrieval and modification. Users are presented with list of the ingredients that have been detected, along with the option of adding other ingredients or deleting ingredients that they might not want or were misidentified.
- After the ingredient list is confirmed by the user through the web interface, they can initiate the recipe generation process with a click of a button. This action invokes another Lambda function called
GenerateRecipes
, which uses the advanced language capabilities of the Amazon Bedrock API (Anthropic’s Claude v3 in this post). This state-of-the-art FM analyzes the confirmed ingredient list retrieved from DynamoDB and generates relevant recipes tailored to those specific ingredients. Additionally, the model provides images to accompany each recipe, providing a visually appealing and inspiring culinary experience. - Amazon Bedrock contains two key FMs that are used for this solution example: Anthropic’s Claude v3 (newer versions have been released since the writing of this post) and Stable Diffusion, used for recipe generation and image generation respectively. For this solution, you can use any combination of FMs that suit your use case. The generated content (recipes as text and recipe images, in this case) can then be displayed to the user on the frontend.
- For this use case, you can also set up an optional ordering pipeline, which allows a user to place orders for the ingredients described by the FMs. This would be fronted by a Lambda function,
FindGroceryItems
, that can look for the recommended grocery items in a database contributed to by local supermarkets. This database would consist of about-to-expire ingredients along with prices for those ingredients.
In the following sections, we dive into how you can set up this architecture on your own account. Step 8 is optional and therefore not covered in this post.
Using Amazon Rekognition to detect images
The image recognition is powered by Amazon Rekognition, which offers pre-trained and customizable computer vision capabilities to allow users to obtain information and insights from their images. For customizability, you can use Rekognition Custom Labels to identify scenes and objects in your images that are specific to your business needs. If your images are already labeled, you can begin training a model from the Amazon Rekognition console. Otherwise, you can label them directly from the Amazon Rekognition labeling interface, or use other services such as Amazon SageMaker Ground Truth. The following screenshot shows an example of what the bounding box process would look like on the Amazon Rekognition labeling interface.
To get started with labeling, see Using Amazon Rekognition Custom Labels and Amazon A2I for detecting pizza slices and augmenting predictions. For this architecture, we collected a dataset of up to 70 images of common food items typically found in refrigerators. We recommend that you gather your own relevant images and store them in an S3 bucket to use for training with Amazon Rekognition. You can then use Rekognition Custom Labels to create labels with food names, and assign bounding boxes on the images so the model knows where to look. To get started with training your own custom model, see Training an Amazon Rekognition Custom Labels model.
When model training is complete, you will see all your trained models under Projects on the AWS Management Console for Amazon Rekognition. Here, you can also look at the model performance, measured by the F1 score (shown in the following screenshot).
You can also iterate and modify your existing models to create newer versions. Before using your model, make sure it’s in STARTED state. To use the model, choose the model you want to use, and on the Use model tab, choose Start.
You also have the option to programmatically start and stop your model (the exact API call can be copied from the Amazon Rekognition console, but the following is provided as an example):
Use the following API (which is present in the Lambda function) call to detect groceries in an image using your custom labels and custom models:
To stop incurring costs, you can also stop your model when not in use:
Because we’re using Python, the boto3 Python package is used to make all AWS API calls mentioned in this post. For more information about Boto3, see the Boto3 documentation.
Starting a model might take a few minutes to complete. To check the current status of the model readiness, check the details page for the project or use DescribeProjectVersions. Wait for the model status to change to RUNNING.
In the meantime, you can explore the different statistics provided by Amazon Rekognition about your model. Some notable ones are the model performance (F1 score), precision, and recall. These statistics are gathered by Amazon Rekognition at both the model level (as seen in the earlier screenshot) and the individual custom label level (as shown in the following screenshot).
For more information on these statistics, see Metrics for evaluating your model.
Be aware that, while Anthropic’s Claude models offer impressive multi-modal capabilities for understanding and generating content based on text and images, we chose to use Amazon Rekognition Custom Labels for ingredient detection in this solution. Amazon Rekognition is a specialized computer vision service optimized for tasks such as object detection and image classification, using state-of-the-art models trained on massive datasets. Additionally, Rekognition Custom Labels allows us to train custom models tailored to recognize specific food items and ingredients, providing a level of customization that might not be as straightforward with a general-purpose language model. Furthermore, as a fully managed service, Amazon Rekognition can scale seamlessly to handle large volumes of images. While a hybrid approach combining Rekognition and Claude’s multi-modal capabilities could be explored, we chose Rekognition Custom Labels for its specialized computer vision capabilities, customizability, and to demonstrate combining FMs on Amazon Bedrock with other AWS services for this specific use case.
Using Amazon Bedrock FMs to generate recipes
To generate the recipes, we use Amazon Bedrock, a fully managed service that offers high-performing FMs. We use the Amazon Bedrock API to query Anthropic’s Claude v3 Sonnet model. We use the following prompt to provide context to the FM:
The following code is the body of the Amazon Bedrock API call: