IBM & Red Hat on AWS
Monitoring Amazon Bedrock Large Language Models with IBM Instana
With the growing adoption of large language models (LLMs) across applications like Natural Language Processing (NLP) and advanced Artificial Intelligence (AI) tasks, ensuring robust observability has become crucial for maintaining performance and reliability.
Amazon Bedrock enables you to easily build and scale generative AI applications with foundation models, while IBM Instana (Instana) provides comprehensive observability solutions tailored to monitor these applications, detect anomalies, and ensure optimal operation.
In this post, we explain how to set up observability for Amazon Bedrock applications with Instana and demonstrate its value using a practical test case.
Solution overview
Instana’s integration for Amazon Bedrock empowers organizations to monitor their AI-driven workloads with real-time metrics and distributed tracing capabilities. By leveraging Instana’s OpenTelemetry (OTel) capabilities, including Traceloop’s OpenLLMetry, you can seamlessly monitor LLM deployments like Amazon Bedrock.
With this integration, you can track key performance indicators such as:
- Model latency
- Token and cost usage
- API request volume
Additionally, you can trace the entire request lifecycle and pinpoint performance bottlenecks or failures using Instana’s distributed tracing capabilities, which are extended for Amazon Bedrock model API calls.
Reference Architecture
Instana uses OTel to enable observability for Amazon Bedrock applications. Through this integration, Instana collects telemetry data from Bedrock model API calls, supporting both Agent Mode and Agentless Mode for flexible deployment. With Instana, you can monitor key metrics and traces from Bedrock workloads, providing detailed insights into performance and request flows as seen in the following reference architecture diagram (Figure 1).
Agent Mode
With Agent Mode, data is first sent to the Instana Agent, where it is processed by the LLM sensor before being sent to the Instana Backend. This mode requires you to have an Instana Agent installed in your environment. The Instana Agent takes care of the collection, and you don’t need to set up a separate component in your environment. The Instana host agent supporting LLM collections can be deployed on Amazon Elastic Compute Cloud (EC2). For more details about host agents, refer to the IBM Instana documentation.
Agentless Mode
With Agentless Mode, data bypasses the agent and is sent directly to the Instana Backend, connecting directly with the OTel acceptor. This mode requires you to install and maintain an OTel Collector within your environment. This allows you the flexibility of an agentless environment, but you are responsible for maintenance and resource costs.
Amazon Bedrock observability setup
Let’s look at how to setup observability for Amazon Bedrock applications using Instana. We’ll use an application that simulates a ChatBot service invoking Amazon Bedrock LLMs
After configuring Instana to monitor the application, you’ll be able to collect the following metrics:
- Token usage: Track the amount of token usage for models.
- Cost usage: Monitor the cost associated with the use of Amazon Bedrock services.
- API request count: Count and track the number of API requests sent to Amazon Bedrock.
- Model latency: Measure the response time and latency of model inference.
Moreover, you’ll utilize Instana’s Distributed Tracing capabilities for Amazon Bedrock model API calls to track the entire request lifecycle and pinpoint performance bottlenecks or failures.
Lastly, you’ll leverage Instana to create smart alerts based on errors detected in these API calls. Through Error Detection and Smart Alerts, you can identify specific issues like using invalid AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, causing failures. Instana generates detailed error messages, including error type and root cause, to aid troubleshooting.
Prerequisites
Before you begin, ensure you have the following:
- An AWS account with Amazon Bedrock services configured.
- An Instana account. If you don’t have one, you can start a 14-day free trial from Instana’s PayPerUse offering in the AWS Marketplace.
- Intana configured with with OpenTelemetry (OTel).
- IAM roles configured for AWS Lambda to access Amazon Bedrock.
- AWS Lambda attached to an Amazon VPC for access to Amazon EC2 instances for Instana and the OpenTelemetry Connector.
Cost
You are responsible for the cost of the AWS services used when deploying the solution described in this blog in your AWS account. For cost estimates, refer to the pricing pages for each AWS service or use the AWS Pricing Calculator. Some of the AWS services that may incur costs include Amazon Bedrock, AWS Lambda, and Amazon EC2.
Prepare the Amazon Bedrock application
Disclaimer: Please note that the sample code provided in this blog post is for demonstration purposes only and isn’t intended for production use. The code is not hosted and managed by AWS. We encourage you to review the code to understand how it works. For help with third-party components like IBM Instana, please refer to the vendor documentation and support channels.
For the use case described in this blog, we are using a React Agent application that invokes Amazon Bedrock. You can access the sample application code from the IBM GitHub repository. In the Python code, the application is named chat_bot_service. This name will be used later in Instana to filter and identify the Amazon Bedrock application.
To set up observability for our Amazon Bedrock sample application, we begin by enabling Traceloop, which serves as the foundation for Amazon Bedrock observability. The Traceloop SDK will instrument the Amazon Bedrock application using OpenTelemetry. Below is the relevant code snippet:
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import task, workflow
Traceloop.init(app_name="chat_bot_service")
Next, make sure to add @workflow and @task annotations for your functions. This will help chain a multi-step process and trace it as a single unit:
@workflow(name="bedrock_chat_query")
def query(question, max_turns=3):
logger.info(f"Query Bedrock for question: {question} ...")
@task(name="chat_bot_wiki")
def wikipedia(q):
logger.warning("call wikipedia https://en.wikipedia.org/w/api.php ...")
return httpx.get("https://en.wikipedia.org/w/api.php", params={
"action": "query",
"list": "search",
"srsearch": q,
"format": "json"
}).json()["query"]["search"][0]["snippet"]
To keep the program running in the background, you can create a script with a while loop:
while :
do
python3 ./bedrock-demo.py
sleep 60
done
Configuring the Instana Metric Collection for Amazon Bedrock
To collect OTel metrics from your LLM, you’ll need to use a bastion host, such as an Amazon EC2 instance. Follow these steps:
- Install Java JDK 11 or higher. Run the following command to check the installed version:
java -version
- Enable OpenTelemetry data ingestion. For more information, see Sending OpenTelemetry data to Instana.
- Download and Install the Data Collector:
wget https://github.com/instana/otel-dc/releases/download/v1.0.5/otel-dc-llm-1.0.5.tar
tar xf otel-dc-llm-1.0.5.tar
- Open and modify the configuration file:
cd otel-dc-llm-1.0.5
vi config/config.yaml
- Update the following fields in the yaml file:
- Connection Mode (agentless.mode):
- Set to true to send metrics data directly to the Instana backend (agentless mode).
- Set to false to send metrics data to the Instana agent (agent mode).
- For more information on the Instana SaaS environment endpoints, see the Instana Backend otlp-acceptor Endpoints documentation.
- Backend URL (backend.url):
- In agentless mode, set this to the gRPC endpoint of the Instana backend otlp-acceptor component. Use https:// scheme if the endpoint of the Instana backend otlp-acceptor in SaaS environments is TLS-enabled. For more information about Instana SaaS environment, see Endpoints of the Instana backend otlp-acceptor.
- In agent mode, set this to the gRPC endpoint of the Instana agent. The otel.backend.url is the gRPC endpoint of the Instana agent, for example http://<instana-agent-host>:4317.
- Callback Interval (callback.interval): Set the time interval (in seconds) to post data to the backend or agent).
- Service Name (otel.service.name): Set a name for the Data Collector service.
- Service Port (otel.service.port): Set the listen port for the Data Collector to receive metrics data from instrumented applications (default is 8000).
- Connection Mode (agentless.mode):
- After configuring the config.yaml file, save your changes and proceed to the next step.
- Edit the properties file to customize the pricing for models in use. Use the following format in the prices.properties file:
<aiSystem>.<modelId>.input=0.0
<aiSystem>.<modelId>.output=0.0
Note: <aiSystem> is the LLM provider in use, and <modelId> can be set to asterisk (*) to match all model IDs within the <aiSystem>. However, items with a specific model ID specified have a higher priority.
- For our Amazon Bedrock integration example, we are using the following configuration:
bedrock.anthropic.claude-v2.input=0.008
bedrock.anthropic.claude-v2.output=0.024
- Edit the properties file by running the following command:
vi config/logging.properties
Note: Configure the Java logging settings in the logging.properties file according to your needs.
- Run the Data Collector with the following command according to your current system:
nohup ./bin/otel-dc-llm >/dev/null 2>&1 &
You can use tools like tmux or screen to run the program in the background.
Amazon Bedrock Observability in IBM Instana Console
- Log in to the Instana dashboard with your username and password
- Navigate to the Infrastructure section and locate the Amazon Bedrock application.
- Choose Analyze Infrastructure (Figure 2).
- Use llm as a filter in the dashboard to narrow metrics to LLM-specific data sources (Figure 3), and choose the OTel LLMonitor option under the Entity types list.
The following screenshot provides an overview of Amazon Bedrock metrics in the Instana UI. In our Amazon Bedrock React Agent application, we are using Claude-v2, so only this model’s metrics are displayed. The metrics include total tokens, total cost, total requests, and more.
If you build another application using different models from Amazon Bedrock, you will be able to see metrics for those models in the Instana dashboard as well.
Tracing
You can visualize tracing information for the Amazon Bedrock React Agent application in the Instana dashboard. To access the tracing information, follow the steps below:
- Choose the Calls link in the LLM overview page (Figure 5).
- From the Analytics page, set the filter to chat_bot_service and choose the Internal trigger link (Figure 6). This will take you to the chat_bot_service tracing page.
With tracing enabled, you can view all spans for the Amazon Bedrock React Agent application. A span represents an individual unit of work, such as an LLM call or tool call, within a larger trace. As a React Agent, this application provides spans for LLM calls, tool calls, and other related activities through Instana tracing.
- Expand the Calls to view the full stack of the Amazon Bedrock React Agent application and all tags for a specified LLM call or tool call (Figure 7).
In this example, we check the details for an LLM call, including tags such as the prompt, response, total tokens, and more.
Creating Smart Alerts for Amazon Bedrock
Defining the Application
To enable Smart Alerts for the chat_bot_service, first create an Instana application and then link it to a Smart Alert.
- From the bottom right of the Applications page, choose the + ADD button to create a new application (Figure 8).
Create Smart Alert
After the application is created, you will see a summary of the application along with related metrics. However, since the application was just created, no metrics will be available immediately. You will need to wait a few minutes for data to appear.
- Choose the ADD SMART ALERT button in the bottom right to create an alert. When setting up the alert, be sure to select the Switch to Advanced Mode option (Figure 9).
- For this blog, we want to capture erroneous calls. Select Erroneous Calls to build the alert, then choose the Create button to set up the Smart Alert (Figure 10).
You will receive a notification in the Instana dashboard confirming the successful creation of the new Smart Alert, along with a link to access it directly (Figure 11).
Trigger an Alert
To trigger the alert, inject error data into the application. The simplest way is to use an invalid AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, which will cause all LLM calls to fail.
After running the Amazon Bedrock React Agent application with the invalid key multiple times, or by using a while loop, the alert will appear on the alert page (Figure 12).
Analyze Alerts
- Select one of the alerts listed under the Alerts Created list to navigate to the alert details page, which displays related events.
- Choose the Analyze Calls button to get more details on what’s wrong with your application (Figure 13).
- After selecting the Analyze Calls button, you will see all the erroneous calls. Click the Internal trigger link to view the details of the erroneous calls (Figure 14).
- By choosing one of the erroneous calls displayed in the previous image, you can view the error details. In our example, after updating the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to an invalid values, Instana’s smart alert showed the following error message (Figure 15):
Summary
Using IBM Instana with Amazon Bedrock enhances observability and control over AI-driven applications, especially those powered by large language models. With metrics, tracing, and smart alerts, this integration ensures that Amazon Bedrock applications are effectively monitored, providing valuable insights into model performance and operational efficiency. In this blog, we demonstrated how Instana can offer deeper visibility and control over Amazon Bedrock-powered AI workloads.
Amazon Bedrock also offers additional APIs that complement the InvokeModel API used in the example described in this blog, including APIs for Agents, Prompt Flows, or Knowledge Base applications, which can also be used as entry points.
Disclaimer
The Amazon Bedrock Monitoring feature is available as a Public Preview, as outlined in the Instana Release Qualifiers in the product Documentation.
Additional Content:
- Realtime monitoring of microservices and cloud-native applications with IBM Instana SaaS on AWS
- Automate Observability for AWS with IBM Instana self-hosted
- What is IBM Instana
- Using IBM Instana for full stack observability on AWS
- AWS Partner IBM