Using generative AI to analyze game reviews from players and press

Professional game reviewers and players both provide essential feedback that helps game developers and studios improve their games. Professional reviews offer expert analysis of technical aspects and design, while player reviews provide insights into real-world experiences and practical issues encountered during gameplay.

Game developers, studios, and publishers face significant challenges when evaluating game reviews due to the sheer volume and diversity of feedback. To address these changes effectively, developers need robust systems to categorize and prioritize feedback so they can focus on the most critical issues. This is specifically challenging for smaller studios, who often struggle to manage large amounts of feedback with limited staff and financial resources.

We will demonstrate how to build a serverless solution that allows developers to upload, process, analyze, and summarize game reviews using Amazon Bedrock. While this example focuses on game reviews, this approach can be adapted to analyze and summarize reviews from other domains.

Solution overview

The solution for sentiment analysis, classification and summarization of game reviews consists of six main components:

User experience
Request management
Workflow orchestration for sentiment analysis and classification
Data and metadata storage
Summarization
Monitoring

The following diagram (Figure 1) illustrates the architecture.

AWS high level architecture diagram illustrating a serverless solution for sentiment analysis, classification, and summarization of game reviews. The diagram shows six main components: User Experience, Request Management, Workflow Orchestration, Data and Metadata Storage, Summarization, and Monitoring. It depicts the flow from user input through various AWS services including Amazon Cognito, Amazon CloudFront, Amazon Simple Storage Service (Amazon S3), Amazon API Gateway, AWS Lambda, AWS Step Functions, Amazon DynamoDB, and Amazon Bedrock. The architecture demonstrates how game reviews are processed, analyzed for sentiment and topics, stored, and summarized using AI services, with Amazon CloudWatch for monitoring the entire system. The diagram illustrates the integration of these services to create a comprehensive workflow for handling game review data, from initial user requests to final summarization and storage.

Figure 1: High level serverless architecture for game review analysis and summarization using Amazon Bedrock.

User experience: The solution contains a static web application hosted in Amazon Simple Storage Service (Amazon S3). We deploy an Amazon CloudFront distribution to serve this static website and implement origin access control (OAC) to restrict access to the Amazon S3 origin. Additionally, we use Amazon Cognito to protect the web application from unauthorized access.

Request management: We use Amazon API Gateway as the entry point for all near real-time communication between the UI application and the APIs exposed by the different workloads of the solution. Through this gateway, users can initiate requests for creating, reading, updating, deleting (CRUD) data, as well as running workflows. The API requests also invoke Amazon Web Services (AWS) Lambda functions that send the pre-processed requests to AWS Step Functions and retrieve and summarize reviews.

Workflow orchestration for sentiment analysis and classification: The sentiment analysis and classification of game reviews begins by creating a JSONL file containing the necessary prompt and properties required for analyzing each review. Using Anthropic Claude 3.5 Sonnet, a large language foundation model hosted in Amazon Bedrock, we process the game reviews in batches.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies. It also enables you to bring your own custom models and use them seamlessly on Amazon Bedrock. We encourage you to experiment with different models to find what works best for your company’s situation.

After the Amazon Bedrock job completes, the batch analysis results are stored in an S3 bucket. We then read the results from the S3 bucket and store them in Amazon DynamoDB, enabling users to query the results and filter game reviews based on their topic classification and sentiment.

Data and metadata storage: This solution leverages Amazon S3 for storing uploaded game reviews and output results, providing durable, highly available, and scalable data storage at a low cost. We use Amazon DynamoDB, a NoSQL database service, to store all analysis and job metadata, allowing users to track batch job status and other relevant information efficiently.

Monitoring: The solution stores the logs in Amazon CloudWatch Logs, providing invaluable monitoring information during both development and live operations.

Prerequisite

Before you start. You will need to download the solution and review the complete instructions on how to use it from our GitHub repository.

Solution walkthrough

This walkthrough focuses on two key aspects of the solution:

The AWS Step Functions workflow for sentiment analysis and topic classification using the Amazon Bedrock Batch Inference API
The Amazon Bedrock Converse API for summarizing game reviews

The first step is creating a game and an analysis job, as displayed in Figure 2.

A web interface for "Game Reviews Analysis" showing a newly created game "ExampleGame" with an associated analysis job. The interface displays a table with one job titled "Example Analysis Job" with a status of "Not Submitted". An actions menu for the job is open, revealing options including "Details","Update Status", "Upload Reviews", "Submit Job" and "Delete". The user has created a game and an analysis job, and is ready to upload a CSV file with 1000 reviews before submitting the job for analysis.

Figure 2: Web application interface for managing game reviews analysis tasks, including adding new games, editing game details, and creating jobs to process and analyze review data.

The solution generates an Amazon S3 presigned URL, allowing the website to upload the CSV file containing the game reviews directly to Amazon S3 in a secure manner. The API expects the CSV file to include id and review columns. Once the file is successfully uploaded to Amazon S3, the user can start the analysis job.

The solution uses the Bedrock Batch Inference API to efficiently process large numbers of requests asynchronously. This feature requires input data to be in JSONL format and stored in Amazon S3.

The first Lambda function in the Step Function Workflow is responsible for transforming the uploaded CSV file into a JSONL file and storing it in Amazon S3 before processing. Amazon Bedrock batch inference requires each line in the JSONL file to follow a specific format as shown in the following example:

{
    "recordId": "111111111",
    "modelInput": {
        "temperature": 0.0,
        "top_k": 1,
        "top_p": 1.0,
        "max_tokens": 2000,
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "prompt here"
                    }
                ]
            }
        ]
    }
}

The modelInput object must follow the format specific to the underlying foundation model. In this solution we use Anthropic Claude 3.5 Sonnet, which requires several configuration parameters:

Temperature: Determines response randomness, ranging from 0-1. Higher values produce more creative responses (such as 0.8), while lower values generate more consistent outputs (such as 0.2).
Top_p: Sets the cumulative probability distribution cut off when extracting tokens based on cumulative probability distribution. Values closer to one yield more creative responses, while lower values closer to zero produce more predictable outputs. You should alter either temperature or top_p, but not both.
Top_k: This optional parameter, ranging from 0-500, limits the number of word choices available to the model at each step.
Max_tokens: Defines the maximum tokens returned in the model’s response.

To achieve this goal, in the prompt, we instruct the model to perform two types of analysis:

Determine the overall sentiment of the review as either Positive, Negative, or Neutral.
Classify the topics covered by the review based on this list: Price, Sound, Story, Support, Controls, Gameplay, Graphics, Multiplayer, and Performance. Then for each identified topic, determine the sentiment as either Positive, Negative, or Neutral.
As part of the prompt, we also provided example game reviews input and expected model outputs. This is called “Few-shot” prompting.

This structured classification approach allows us to perform further comprehensive analysis based on the topic and sentiment. The prompt for this solution example can be found here, you may need to tweak parts of the prompt to find what works best for your company’s situation.

As the next step in the same Lambda function, we create a new batch inference job using the Amazon Bedrock CreateModelInvocationJob API through the AWS SDK for Python (Boto3). You can find the full code here.

Next, we monitor the status of the batch inference job. This is accomplished through an AWS Lambda function that polls the job status using the Amazon Bedrock GetModelInvocationJob API, as shown in the code here.

The final step of the Step Function workflow involves processing and storing the batch job results.

First, we load the results from the previously specified Amazon S3 URI. Since the batch inference job outputs data is in JSONL format, we parse this data and store the individual results as items in an Amazon DynamoDB table. We called ours “GameReviewsTable”.

The following is an example of a JSON lines output by Amazon Bedrock Batch Inference.

{
    "modelInput": {
        "temperature": 0.0,
        "top_k": 1,
        "top_p": 1.0,
        "max_tokens": 2000,
        "anthropic_version": "bedrock-2023-05-31",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "prompt"
                    }
                ]
            }
        ]
    },
    "modelOutput": {
        "id": "uuid",
        "type": "message",
        "role": "assistant",
        "model": "claude-3-sonnet-20240229",
        "content": [
            {
                "type": "text",
                "text": "<result>{\"overall_sentiment\":\"Positive\",\"classifications\":[{\"topic\":\"Gameplay\",\"sentiment\":\"Positive\"},{\"topic\":\"Multiplayer\",\"sentiment\":\"Positive\"}]}</result>"
            }
        ],
        "stop_reason": "end_turn",
        "stop_sequence": null,
        "usage": {
            "input_tokens": 257,
            "output_tokens": 44
        }
    },
    "recordId": "111111111"
}

Summarization

Our solution uses the Converse API of Amazon Bedrock for summarizing game reviews. One of its key features is the call a tool capability, which enables models to interact with external systems for more accurate, contextual, and up-to-date responses.

The summarization process begins when users enter a prompt, such as “What aspects of my game’s gameplay should I improve?”. The Converse API then analyzes this prompt and creates a request payload for the GamesCRUD API (Lambda function). During analysis, it automatically figures out relevant parameters required by the GamesCRUD API, including sentiment (such as “Negative”) and topic classification (such as “GamePlay”).

After analysis, the Lambda function queries the API using the generated JSON payload. This retrieves relevant game reviews based on the determined sentiment and topic classification. These retrieved reviews are then passed to the large language model (LLM), which generates a comprehensive summary of the relevant feedback.

This streamlined process validates that users receive targeted, relevant summaries based on their specific queries about game feedback. The system’s ability to understand context and filter relevant reviews makes it particularly effective for developers seeking specific insights about their games.

Summarization response in the UI from Negative game reviews classified as “GamePlay”. The prompt is: “What aspects of my game’s gameplay should I improve?”. The response is: "The reviews indicate that the gameplay aspects that need the most improvement are: 1. Server stability and performance - Many players are experiencing frequent disconnections, server crashes, and other technical issues that make the game unplayable at times, especially on the EU servers. This is a major pain point that needs to be addressed. 2. Communication from the developers - There is a lack of transparency and communication from the developers about the server issues and plans to fix them. Players feel frustrated by the lack of updates and compensation for the downtime. 3. Progression/leveling system - Some players feel the progression and leveling is too fast or too grindy, lacking a satisfying feeling of accomplishment. The gear upgrading system also seems to need refinement. 4. Controls and customization - While the combat is generally praised, some players find the controls, especially for controller users, to be lacking in customization options and intuitive feel. In summary, the core gameplay seems promising, but the technical issues and lack of polish in certain systems are major detractors that need to be improved. The developers should focus on stabilizing the servers, improving communication, and refining the progression/leveling mechanics to create a more polished and enjoyable experience."

Figure 3: Summarization response in the UI from negative game reviews classified as “GamePlay”.

Conclusion

This solution demonstrates a comprehensive serverless architecture for analyzing and summarizing game reviews using Amazon Web Services, particularly leveraging the capabilities of Amazon Bedrock. This solution handles the complex tasks of sentiment and topic classification, and review summarization through a serverless pipeline. It particularly highlights the power of the Amazon Bedrock Batch Inference API for processing large volumes of reviews and the Converse API for generating intelligent summaries with the call a tool capability.

In conclusion, care needs to be taken when using this solution. Players often use sarcasm to express negative feelings, and understanding context is crucial, as seemingly negative phrases might simply be neutral descriptions of gameplay experiences. While these challenges can be addressed through human verification and the use of multiple language models and prompt engineering, such investigations are beyond the scope of our current discussion.

Contact an AWS Representative to know how we can help accelerate your business.

AWS for Games Blog

Using generative AI to analyze game reviews from players and press

Solution overview

Prerequisite

Solution walkthrough

Summarization

Conclusion

Further reading

Resources

Follow