Monitoring Amazon Bedrock Large Language Models with IBM Instana

With the growing adoption of large language models (LLMs) across applications like Natural Language Processing (NLP) and advanced Artificial Intelligence (AI) tasks, ensuring robust observability has become crucial for maintaining performance and reliability.

Amazon Bedrock enables you to easily build and scale generative AI applications with foundation models, while IBM Instana (Instana) provides comprehensive observability solutions tailored to monitor these applications, detect anomalies, and ensure optimal operation.

In this post, we explain how to set up observability for Amazon Bedrock applications with Instana and demonstrate its value using a practical test case.

Solution overview

Instana’s integration for Amazon Bedrock empowers organizations to monitor their AI-driven workloads with real-time metrics and distributed tracing capabilities. By leveraging Instana’s OpenTelemetry (OTel) capabilities, including Traceloop’s OpenLLMetry, you can seamlessly monitor LLM deployments like Amazon Bedrock.

With this integration, you can track key performance indicators such as:

Model latency
Token and cost usage
API request volume

Additionally, you can trace the entire request lifecycle and pinpoint performance bottlenecks or failures using Instana’s distributed tracing capabilities, which are extended for Amazon Bedrock model API calls.

Reference Architecture

Instana uses OTel to enable observability for Amazon Bedrock applications. Through this integration, Instana collects telemetry data from Bedrock model API calls, supporting both Agent Mode and Agentless Mode for flexible deployment. With Instana, you can monitor key metrics and traces from Bedrock workloads, providing detailed insights into performance and request flows as seen in the following reference architecture diagram (Figure 1).

Figure 1. Architecture for Amazon Bedrock observability with IBM Instana.

Agent Mode

With Agent Mode, data is first sent to the Instana Agent, where it is processed by the LLM sensor before being sent to the Instana Backend. This mode requires you to have an Instana Agent installed in your environment. The Instana Agent takes care of the collection, and you don’t need to set up a separate component in your environment. The Instana host agent supporting LLM collections can be deployed on Amazon Elastic Compute Cloud (EC2). For more details about host agents, refer to the IBM Instana documentation.

Agentless Mode

With Agentless Mode, data bypasses the agent and is sent directly to the Instana Backend, connecting directly with the OTel acceptor. This mode requires you to install and maintain an OTel Collector within your environment. This allows you the flexibility of an agentless environment, but you are responsible for maintenance and resource costs.

Amazon Bedrock observability setup

Let’s look at how to setup observability for Amazon Bedrock applications using Instana. We’ll use an application that simulates a ChatBot service invoking Amazon Bedrock LLMs

After configuring Instana to monitor the application, you’ll be able to collect the following metrics:

Token usage: Track the amount of token usage for models.
Cost usage: Monitor the cost associated with the use of Amazon Bedrock services.
API request count: Count and track the number of API requests sent to Amazon Bedrock.
Model latency: Measure the response time and latency of model inference.

Moreover, you’ll utilize Instana’s Distributed Tracing capabilities for Amazon Bedrock model API calls to track the entire request lifecycle and pinpoint performance bottlenecks or failures.

Lastly, you’ll leverage Instana to create smart alerts based on errors detected in these API calls. Through Error Detection and Smart Alerts, you can identify specific issues like using invalid AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, causing failures. Instana generates detailed error messages, including error type and root cause, to aid troubleshooting.

Prerequisites

Before you begin, ensure you have the following:

An AWS account with Amazon Bedrock services configured.
An Instana account. If you don’t have one, you can start a 14-day free trial from Instana’s PayPerUse offering in the AWS Marketplace.
Intana configured with with OpenTelemetry (OTel).
IAM roles configured for AWS Lambda to access Amazon Bedrock.
AWS Lambda attached to an Amazon VPC for access to Amazon EC2 instances for Instana and the OpenTelemetry Connector.

Cost

You are responsible for the cost of the AWS services used when deploying the solution described in this blog in your AWS account. For cost estimates, refer to the pricing pages for each AWS service or use the AWS Pricing Calculator. Some of the AWS services that may incur costs include Amazon Bedrock, AWS Lambda, and Amazon EC2.

Prepare the Amazon Bedrock application

Disclaimer: Please note that the sample code provided in this blog post is for demonstration purposes only and isn’t intended for production use. The code is not hosted and managed by AWS. We encourage you to review the code to understand how it works. For help with third-party components like IBM Instana, please refer to the vendor documentation and support channels.

For the use case described in this blog, we are using a React Agent application that invokes Amazon Bedrock. You can access the sample application code from the IBM GitHub repository. In the Python code, the application is named chat_bot_service. This name will be used later in Instana to filter and identify the Amazon Bedrock application.

To set up observability for our Amazon Bedrock sample application, we begin by enabling Traceloop, which serves as the foundation for Amazon Bedrock observability. The Traceloop SDK will instrument the Amazon Bedrock application using OpenTelemetry. Below is the relevant code snippet:

from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import task, workflow
Traceloop.init(app_name="chat_bot_service")

Next, make sure to add @workflow and @task annotations for your functions. This will help chain a multi-step process and trace it as a single unit:

@workflow(name="bedrock_chat_query")
def query(question, max_turns=3):
logger.info(f"Query Bedrock for question: {question} ...")
@task(name="chat_bot_wiki")
def wikipedia(q):
    logger.warning("call wikipedia https://en.wikipedia.org/w/api.php ...")
    return httpx.get("https://en.wikipedia.org/w/api.php", params={
        "action": "query",
        "list": "search",
        "srsearch": q,
        "format": "json"
    }).json()["query"]["search"][0]["snippet"]

To keep the program running in the background, you can create a script with a while loop:

while :
do
  python3 ./bedrock-demo.py
  sleep 60
done

Configuring the Instana Metric Collection for Amazon Bedrock

To collect OTel metrics from your LLM, you’ll need to use a bastion host, such as an Amazon EC2 instance. Follow these steps:

Install Java JDK 11 or higher. Run the following command to check the installed version:

java -version

Enable OpenTelemetry data ingestion. For more information, see Sending OpenTelemetry data to Instana.
Download and Install the Data Collector:

wget https://github.com/instana/otel-dc/releases/download/v1.0.5/otel-dc-llm-1.0.5.tar
tar xf otel-dc-llm-1.0.5.tar

Open and modify the configuration file:

cd otel-dc-llm-1.0.5
vi config/config.yaml

Update the following fields in the yaml file:
- Connection Mode (agentless.mode):
  - Set to true to send metrics data directly to the Instana backend (agentless mode).
  - Set to false to send metrics data to the Instana agent (agent mode).
  - For more information on the Instana SaaS environment endpoints, see the Instana Backend otlp-acceptor Endpoints documentation.
- Backend URL (backend.url):
  - In agentless mode, set this to the gRPC endpoint of the Instana backend otlp-acceptor component. Use https:// scheme if the endpoint of the Instana backend otlp-acceptor in SaaS environments is TLS-enabled. For more information about Instana SaaS environment, see Endpoints of the Instana backend otlp-acceptor.
  - In agent mode, set this to the gRPC endpoint of the Instana agent. The otel.backend.url is the gRPC endpoint of the Instana agent, for example http://<instana-agent-host>:4317.
- Callback Interval (callback.interval): Set the time interval (in seconds) to post data to the backend or agent).
- Service Name (otel.service.name): Set a name for the Data Collector service.
- Service Port (otel.service.port): Set the listen port for the Data Collector to receive metrics data from instrumented applications (default is 8000).
After configuring the config.yaml file, save your changes and proceed to the next step.
Edit the properties file to customize the pricing for models in use. Use the following format in the prices.properties file:

<aiSystem>.<modelId>.input=0.0
<aiSystem>.<modelId>.output=0.0

Note: <aiSystem> is the LLM provider in use, and <modelId> can be set to asterisk (*) to match all model IDs within the <aiSystem>. However, items with a specific model ID specified have a higher priority.

For our Amazon Bedrock integration example, we are using the following configuration:

bedrock.anthropic.claude-v2.input=0.008
bedrock.anthropic.claude-v2.output=0.024

Edit the properties file by running the following command:

vi config/logging.properties

Note: Configure the Java logging settings in the logging.properties file according to your needs.

Run the Data Collector with the following command according to your current system:

nohup ./bin/otel-dc-llm >/dev/null 2>&1 &

You can use tools like tmux or screen to run the program in the background.

Amazon Bedrock Observability in IBM Instana Console

Log in to the Instana dashboard with your username and password
Navigate to the Infrastructure section and locate the Amazon Bedrock application.
Choose Analyze Infrastructure (Figure 2).

Screenshot of the IBM Instana console with the option to Analyzing infrastructure resources.

Figure 2. Analyzing infrastructure resources with IBM Instana.

Use llm as a filter in the dashboard to narrow metrics to LLM-specific data sources (Figure 3), and choose the OTel LLMonitor option under the Entity types list.

Screenshot of the IBM Instana dashboard showing how to filter by LLM to find LLM-specific metric data sources.

Figure 3. Filter and choose LLM-specific data sources in Instana.

The following screenshot provides an overview of Amazon Bedrock metrics in the Instana UI. In our Amazon Bedrock React Agent application, we are using Claude-v2, so only this model’s metrics are displayed. The metrics include total tokens, total cost, total requests, and more.

Screenshot of the Instana dashboard showing Amazon Bedrock LLM metrics such as Total tokens, Total Cost, Total Request counts.

Figure 4. Overview of Amazon Bedrock metrics in Instana.

If you build another application using different models from Amazon Bedrock, you will be able to see metrics for those models in the Instana dashboard as well.

Tracing

You can visualize tracing information for the Amazon Bedrock React Agent application in the Instana dashboard. To access the tracing information, follow the steps below:

Choose the Calls link in the LLM overview page (Figure 5).

Screenshot of the IBM Instana dashboard showing how to use the Calls link and navigate to the Amazon Bedrock API calls to visualize tracing information.

Figure 5. Use the Calls link to navigate to the Amazon Bedrock API calls and visualize tracing information.

From the Analytics page, set the filter to chat_bot_service and choose the Internal trigger link (Figure 6). This will take you to the chat_bot_service tracing page.

Screenshot of the IBM Instana Analytics page, show how to select the desired service and choosing the Internal Trigger link to view the tracing details.

Figure 6. Identify Amazon Bedrock application API calls to access its tracing data.

With tracing enabled, you can view all spans for the Amazon Bedrock React Agent application. A span represents an individual unit of work, such as an LLM call or tool call, within a larger trace. As a React Agent, this application provides spans for LLM calls, tool calls, and other related activities through Instana tracing.

Expand the Calls to view the full stack of the Amazon Bedrock React Agent application and all tags for a specified LLM call or tool call (Figure 7).

Screenshot of the IBM Instana Analytics page showing the full stack of the Amazon Bedrock React Agent application and all tags for a specified LLM call or tool call.

Figure 7. Full stack of the Amazon Bedrock Chatbot React Application service tracing.

In this example, we check the details for an LLM call, including tags such as the prompt, response, total tokens, and more.

Creating Smart Alerts for Amazon Bedrock

Defining the Application

To enable Smart Alerts for the chat_bot_service, first create an Instana application and then link it to a Smart Alert.

From the bottom right of the Applications page, choose the + ADD button to create a new application (Figure 8).

Screenshot of the IBM Instana dashboard showing how to create a New Application Perspective for the Amazon Bedrock Application.

Figure 8. Creating a New Application Perspective for the Amazon Bedrock Application in IBM Instana.

Create Smart Alert

After the application is created, you will see a summary of the application along with related metrics. However, since the application was just created, no metrics will be available immediately. You will need to wait a few minutes for data to appear.

Choose the ADD SMART ALERT button in the bottom right to create an alert. When setting up the alert, be sure to select the Switch to Advanced Mode option (Figure 9).

Screenshot of the Instana dashboard showing how to create a Smart Alert for the Amazon Bedrock chatbot application.

Figure 9. Creating a Smart Alert for the Amazon Bedrock chatbot application.

For this blog, we want to capture erroneous calls. Select Erroneous Calls to build the alert, then choose the Create button to set up the Smart Alert (Figure 10).

Screenshot of the IBM Instana dashboard showing how to select ana Alert Condition to create a new Smart Alert.

Figure 10. Selecting an Alert Condition and creating a Smart Alert.

You will receive a notification in the Instana dashboard confirming the successful creation of the new Smart Alert, along with a link to access it directly (Figure 11).

Screenshot of the IBM Instana dashboard showing a message of the new Smart Alert being created and providing a direct access link to it.

Figure 11. Instana dashboard notification for the Smart Alert creation with a direct access link.

Trigger an Alert

To trigger the alert, inject error data into the application. The simplest way is to use an invalid AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, which will cause all LLM calls to fail.

After running the Amazon Bedrock React Agent application with the invalid key multiple times, or by using a while loop, the alert will appear on the alert page (Figure 12).

Screenshot of the Instana Smart Alert showing Alerts triggered due to erroneous call rate exceeding threshold.

Figure 12. Alerts triggered due to erroneous call rate exceeding threshold.

Analyze Alerts

Select one of the alerts listed under the Alerts Created list to navigate to the alert details page, which displays related events.
Choose the Analyze Calls button to get more details on what’s wrong with your application (Figure 13).

Screenshot of the Instana dashboard showing how to choose the Analyze Calls button to get more details on what’s wrong with your application.

Figure 13. Using the Analyze Calls option to quickly identify the root cause of the alerts.

After selecting the Analyze Calls button, you will see all the erroneous calls. Click the Internal trigger link to view the details of the erroneous calls (Figure 14).

Screenshot of the Instana Analytics dashboard showingall the erroneous calls. Choose the Internal trigger link to visualize details of the errors.

Figure 14. Identifying Errors in Amazon Bedrock Application API Calls.

By choosing one of the erroneous calls displayed in the previous image, you can view the error details. In our example, after updating the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to an invalid values, Instana’s smart alert showed the following error message (Figure 15):

Screenshot of IBM Instana Analytics dashboar where you can view the error details. And, identify the Root Cause for Amazon Bedrock Application via Smart Alert.

Figure 15. Identifying the Root Cause for Amazon Bedrock Application via Smart Alert.

Summary

Using IBM Instana with Amazon Bedrock enhances observability and control over AI-driven applications, especially those powered by large language models. With metrics, tracing, and smart alerts, this integration ensures that Amazon Bedrock applications are effectively monitored, providing valuable insights into model performance and operational efficiency. In this blog, we demonstrated how Instana can offer deeper visibility and control over Amazon Bedrock-powered AI workloads.

Amazon Bedrock also offers additional APIs that complement the InvokeModel API used in the example described in this blog, including APIs for Agents, Prompt Flows, or Knowledge Base applications, which can also be used as entry points.

Disclaimer

The Amazon Bedrock Monitoring feature is available as a Public Preview, as outlined in the Instana Release Qualifiers in the product Documentation.

IBM & Red Hat on AWS