Amazon Bedrock Pricing
Pricing overview
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
With Amazon Bedrock, you will be charged for model inference and customization. You have a choice of two pricing plans for inference: 1. On-Demand and Batch: This mode allows you to use FMs on a pay-as-you-go basis without having to make any time-based term commitments. 2. Provisioned Throughput: This mode allows you to provision sufficient throughput to meet your application's performance requirements in exchange for a time-based term commitment.
Pricing Models
Tools
Pricing Details
Pricing is dependent on the modality, provider, and model. Please select the model provider to see detailed pricing.
Amazon Bedrock offers select foundation models (FMs) from leading AI providers like Anthropic, Meta, Mistral AI, and Amazon for batch inference at a 50% lower price compared to on-demand inference pricing. Please refer to model list here.
-
AI21 Labs
-
Amazon
-
Anthropic
-
Cohere
-
Meta Llama
-
Mistral AI
-
Stability AI
-
Custom Model Import
-
AI21 Labs
-
AI21 Labs
On-Demand pricing
AI21 Labs models Price per 1,000 input tokens Price per 1,000 output tokens Jamba 1.5 Large $0.002 $0.008 Jamba 1.5 Mini $0.0002 $0.0004 Jurassic-2 Mid $0.0125 $0.0125 Jurassic-2 Ultra $0.0188 $0.0188 Jamba-Instruct $0.0005 $0.0007 -
Amazon
-
Amazon
Other Amazon models Price Amazon Rerank 1.0 $1.00 per 1,000 queries** **You are charged for number of queries where a query can contain up to 100 document chunks. If the query contains more than 100 document chunks, it is counted as multiple queries. For example, if a request contains 350 documents, it will be treated as 4 queries. Please note that each document can only contain upto 512 tokens (inclusive of the query and document’s total tokens), and if the token length is higher than 512 tokens, it is broken down into multiple documents. -
Anthropic
-
Anthropic
On-Demand and Batch pricing
Region: US East (N. Virginia) and US West (Oregon)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude 3.5 Sonnet**
$0.003
$0.015
$0.0015
$0.0075
Claude 3.5 Haiku
$0.0008
$0.004
$0.0005
$0.0025
Claude 3 Opus*
$0.015
$0.075
$0.0075
$0.0375
Claude 3 Haiku
$0.00025
$0.00125
$0.000125
$0.000625
Claude 3 Sonnet
$0.003
$0.015
$0.0015
$0.0075
Claude 2.1
$0.008
$0.024
N/A
N/A
Claude 2.0
$0.008
$0.024
N/A
N/A
Claude Instant
$0.0008
$0.0024
N/A
N/A
*Claude 3 Opus is currently available in the US West (Oregon) Region **Pricing for Claude 3.5 Sonnet is applicable to each version of Claude 3.5 Sonnet (v1 and v2) - Claude 3.5 Sonnet v2 is currently available in the US West (Oregon) Region Region: Europe (London)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude 3 Sonnet
$0.003
$0.015
$0.0015 $0.0075 Claude 3 Haiku
$0.00025
$0.00125
$0.000125 $0.000625
Region: Europe (Zurich)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude 3.5 Sonnet
$0.003
$0.015
$0.0015
$0.0075
Claude 3 Haiku
$0.00025
$0.00125
$0.000125
$0.000625
Region: South America (Sao Paolo)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude 3 Sonnet
$0.003
$0.015
$0.0015
$0.0075
Claude 3 Haiku
$0.00025
$0.00125
$0.000125
$0.000625
Region: Canada (Central)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude 3 Sonnet
$0.003
$0.015
$0.0015
$0.0075
Claude 3 Haiku
$0.00025
$0.00125
$0.000125
$0.000625
Region: Asia Pacific (Mumbai)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude 3 Sonnet
$0.003
$0.015
$0.0015
$0.0075
Claude 3 Haiku
$0.00025
$0.00125
$0.000125
$0.000625
Region: Asia Pacific (Sydney)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude 3 Sonnet
$0.003
$0.015
$0.0015
$0.0075
Claude 3 Haiku
$0.00025
$0.00125
$0.000125
$0.000625
Region: Asia Pacific (Tokyo)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude Instant
$0.0008
$0.0024
N/A
N/A
Claude 2.0/2.1
$0.008
$0.024
N/A
N/A
Claude 3 Haiku
$0.00025
$0.00125
$0.000125
$0.000625
Claude 3.5 Sonnet
$0.003
$0.015
$0.0015
$0.0075
Region: Asia Pacific (Singapore)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude Instant
$0.0008
$0.0024
$0.0004
$0.0012
Claude 2.0/2.1
$0.008
$0.024
$0.004
$0.012
Claude 3 Haiku
$0.00025
$0.00125
$0.000125
$0.000625
Claude 3.5 Sonnet
$0.003
$0.015
N/A
N/A
Region: Europe (Paris)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude 3 Haiku
$0.00025
$0.00125
$0.000125
$0.000625
Claude 3 Sonnet
$0.003
$0.015
$0.0015
$0.0075
Region: Europe (Frankfurt)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude Instant
$0.0008
$0.0024
N/A
N/A
Claude 2.0/2.1
$0.008
$0.024
N/A
N/A
Claude 3 Sonnet
$0.003
$0.015
$0.0015
$0.0075
Claude 3.5 Sonnet
$0.003
$0.015
$0.0015
$0.0075
Claude 3 Haiku
$0.00025
$0.00125
$0.000125
$0.000625
Region: Asia Pacific (Seoul)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude 3.5 Sonnet
$0.003
$0.015
N/A
N/A
Claude 3 Haiku
$0.00025
$0.00125
N/A
N/A
Region: US East (Ohio)
Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch) Claude 3.5 Sonnet
$0.003
$0.015
N/A
N/A
Claude 3 Haiku
$0.00025
$0.00125
N/A
N/A
Latency Optimized Inference
Region: US East (Ohio)
Price per 1,000 input tokens Price per 1,000 output tokens Claude 3.5 Haiku $0.001 $0.005 Provisioned Throughput pricing
Region: US East (N. Virginia) and US West (Oregon)
Anthropic models Price per hour per model with
no commitmentPrice per hour per model unit for 1-month commitment Price per hour per model unit for 6-month commitment Claude Instant
$44.00 $39.60
$22.00
Claude 2.0/2.1
$70.00 $63.00
$35.00
Anthropic models Price per hour per model with
no commitmentPrice per hour per model unit for 1-month commitment Price per hour per model unit for 6-month commitment Claude Instant
$44.00 $39.60
$22.00
Claude 2.0/2.1
$70.00 $63.00
$35.00
Region: Asia Pacific (Tokyo)
Anthropic models Price per hour per model unit for 1-month commitment Price per hour per model unit for 6-month commitment Claude Instant
$53.00
$29.00
Claude 2.0/2.1
$86.00
$48.00
Region: Europe (Frankfurt)
Anthropic models Price per hour per model unit for 1-month commitment Price per hour per model unit for 6-month commitment Claude Instant
$49.00
$27.00
Claude 2.0/2.1
$79.00
$44.00
Please reach out to your AWS account team for more details on model units.
-
Cohere
-
Cohere
On-Demand pricing
Cohere models Price per 1,000 input tokens Price per 1,000 output tokens Command $0.0015 $0.0020 Command-Light $0.0003 $0.0006 Command R+ $0.0030 $0.0150 Command R $0.0005 $0.0015 Embed - English $0.0001 N/A Embed - Multilingual $0.0001 N/A Cohere models Price per 1,000 queries** Rerank 3.5 $2.00 **You are charged for number of queries where a query can contain up to 100 document chunks. If the query contains more than 100 document chunks, it is counted as multiple queries. For example, if a request contains 350 documents, it will be treated as 4 queries. Please note that each document can only contain upto 500 tokens (inclusive of the query and document’s total tokens), and if the token length is higher than 512 tokens, it is broken down into multiple documents. Pricing for customization (fine-tuning)
Cohere models Price to train 1,000 tokens Price to store each custom model per month Price to infer from a custom model per model unit per hour (with no-commit Provisioned Throughput pricing) Cohere Command
$0.004
$1.95
$49.50
Cohere Command-Light $0.001 $1.95
$8.56 *Total tokens trained = number of tokens in training data corpus x number of epochs
Provisioned Throughput pricing
Cohere models Price per hour per model
with no commitmentPrice per hour per model unit for 1-month commitment Price per hour per model unit for 6-month commitment
Cohere Command
$49.50 $39.60
$23.77
Cohere Command - Light $8.56 $6.85
$4.11 Embed - English $7.12 $6.76
$6.41 Embed - Multilingual $7.12 $6.76
$6.41 Please reach out to your AWS account or sales team for more details on model units.
-
Meta Llama
-
Meta Llama
Llama 3.2
On-Demand and Batch pricing
Llama 3.1
On-Demand and Batch pricing
Pricing for model customization (fine-tuning)
Provisioned Throughput pricing
Llama 3
On-Demand pricing
Llama 2
On-Demand pricing
Region: US East (N. Virginia) and US West (Oregon)
Meta models Price per 1,000 input tokens Price per 1,000 output tokens Llama 2 Chat (13B)
$0.00075
$0.001
Llama 2 Chat (70B) $0.00195
$0.00256 Pricing for model customization (fine-tuning)
Meta models Price to train 1,000 tokens Price to store each custom model* per month Price to infer from a custom model for 1 model unit per hour (with no-commit Provisioned Throughput pricing) Llama 2 Pretrained (13B)
$0.00149
$1.95
$23.50
Llama 2 Pretrained (70B) $0.00799
$1.95 $23.50 *Custom model storage = $1.95
Provisioned Throughput pricing
Meta models Price per hour per model unit for 1-month commitment Price per hour per model unit for 6-month commitment Llama 2 Pretrained and Chat (13B)
$21.18
$13.08
Llama 2 Pretrained (70B) $21.18
$13.08 *Llama 2 Pre-trained models are available only in provisioned throughput after customization.
Please reach out to your AWS account or sales team for more details on model units.
-
Mistral AI
-
Mistral AI
-
Stability AI
-
Stability AI
On-Demand pricing
Stability AI Model Price per generated image Stable Image Core $0.04 SD3 Large $0.08 Stable Image Ultra $0.14 Previous generation of image models offered by Stability AI are priced per image, depending on step count and image resolution.
Stability AI model Image resolution Price per image generated for standard quality (<=50 steps) Price per image generated for premium quality (>50 steps) SDXL 1.0 Up to 1024 x 1024 $0.04 $0.08 Provisioned Throughput pricing
Stability AI model Price per hour per model unit for 1-month commitment* Price per hour per model unit for 6-month commitment* SDXL 1.0
$49.86
$46.18
*Includes inference for base and custom models
Please reach out to your AWS account or sales team for more details on model units.
Currently, model customization (fine-tuning) is not supported for Stability AI models on Amazon Bedrock.
-
Custom Model Import
-
Custom Model Import
-
Llama
-
Multimodal Llama
-
Mistral
-
Mixtral
-
Flan
-
Llama
-
Regions: US East (N. Virginia) and US West (Oregon)
Custom Model Unit version v1.0 Price per Custom Model Unit per min* $0.0785 Monthly storage cost per Custom Model Unit $1.95 The Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Llama 3.1 8B 128K model requires 2 Custom Model Units, a Llama 3.1 70B 128k model requires 8 Custom Model Units. *Billed in 5 minute windows -
Multimodal Llama
-
Regions: US East (N. Virginia) and US West (Oregon)
Custom Model Unit version v1.0 Price per Custom Model Unit per min* $0.0785 Monthly storage cost per Custom Model Unit $1.95 The Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Llama 3.2 11B 128K model requires 4 Custom Model Units. *Billed in 5 minute windows -
Mistral
-
Regions: US East (N. Virginia) and US West (Oregon)
Custom Model Unit version v1.0 Price per Custom Model Unit per min* $0.0785 Monthly storage cost per Custom Model Unit $1.95 The Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Mistral 7B 32K model requires 1 Custom Model Unit. *Billed in 5 minute windows -
Mixtral
-
Regions: US East (N. Virginia) and US West (Oregon)
Custom Model Unit version v1.0 Price per Custom Model Unit per min* $0.0785 Monthly storage cost per Custom Model Unit $1.95 The Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Mixtral 8x7B 32K model requires 4 Custom Model Units. *Billed in 5 minute windows -
Flan
-
Regions: US East (N. Virginia) and US West (Oregon)
Custom Model Unit version v1.0 Price per Custom Model Unit per min* $0.0785 Monthly storage cost per Custom Model Unit $1.95 The Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Flan-T5 XL 512 model requires 1 Custom Model Unit. *Billed in 5 minute windows
On-Demand Inference Pricing:
You are billed in 5-minute windows for the duration your model copy is active starting from the first successful invocation. The maximum throughput and concurrency limit per model copy depends on factors such as input/output token mix, hardware type, model size, architecture, inference optimizations, and is determined during the model import workflow.Bedrock automatically scales the number of model copies depending on your usage patterns. If there are no invocations for a 5-minute period, Bedrock will scale down to zero and scale back up when you invoke your model. While scaling back up, you may experience a cold-start duration (in tens of seconds) depending on model size. Bedrock also scales up the number of model copies if your inference volume consistently exceeds the concurrency limits of a single model copy. Note: There is a default maximum of 3 model copies per account per imported model that can be increased through Service Quotas.
-
Pricing Tools (Details)
-
Flows
-
Knowledge Bases
-
Amazon Bedrock Guardrails
-
Model Evaluation
-
Flows
-
Amazon Bedrock Flows
You are charged based on the number of node transitions required to execute your application. Bedrock Flows counts a node transition each time a node in your workflow is executed. You are charged for the total number of node transitions across all your flows.
All charges are metered daily and billed monthly starting February 1st, 2025.
Price per 1,000 node transitions $0.035 Additional Charges
You may incur additional charges if the execution of your application workflow utilizes other AWS services or transfers data. For example, if your workflow invokes an Amazon Bedrock Guardrail policy, you will be billed for the number of text units processed by the policy.
-
Knowledge Bases
-
Rerank models
Rerank models are designed to improve the relevance and accuracy of responses in Retrieval Augmented Generation (RAG) applications. They are charged per request.
**You are charged for number of queries where a query can contain up to 100 document chunks. If the query contains more than 100 document chunks, it is counted as multiple queries. For example, if a request contains 350 documents, it will be treated as 4 queries. Please note that each document can only contain upto 512 tokens (inclusive of the query and document’s total tokens), and if the token length is higher than 512 tokens, it is broken down into multiple documents. A query is equivalent to a search unit.
-
Amazon Bedrock Guardrails
-
Amazon Bedrock Guardrails
Guardrail policy*
Price per 1,000 text units**
Content filters
$0.75
Denied topics
$1
Contextual grounding check***
$0.1
Sensitive information filter (PII)
$0.1
Sensitive information filter (regular expression)
Free
Word filters
Free
On-Demand pricing
* Each guardrail policy is optional and can be enabled based on your application requirements. Charges will be incurred based on the policy type used in the guardrail. For example, if a guardrail is configured with content filters and denied topics, charges will be incurred for these two policies, while there will be no charges associated with sensitive information filters.
**A text unit can contain up to 1000 characters. If a text input is more than 1000 characters, it is processed as multiple text units, each containing 1000 characters or less. For example, if a text input contains 5600 characters, it will be charged for 6 text units.*** Contextual grounding check uses a reference source and a query to determine if the model response is grounded based on the source and relevant to the query. The total number of text units charged is calculated by combining all the characters in the source, query, and model response.
Guardrails are not supported for images and embeddings.
-
Model Evaluation
-
Model Evaluation
Model evaluation is charged for the inference from your choice of model. Automatically-generated algorithmic scores are provided at no extra charge. For human-based evaluation where you bring your own workstream, you are charged for the model inference in the evaluation, and a charge of $0.21 per completed human task.
Model
Price per 1,000 input tokens
Price per 1,000 output tokens
Price per human task
Model selected for evaluation
Based on model selected
Based on model selected
$0.21
Pricing examples
-
AI21 labs
An application developer makes the following API calls to Amazon Bedrock: a request to AI21’s Jurassic-2 Mid model to summarize an input of 10K tokens of input text to an output of 2K tokens.
Total cost incurred = 10K tokens/1000 * $0.0125 + 2K tokens/1000 * $0.0125 = $0.15
-
Amazon
On-Demand pricing
An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Amazon Titan Text Lite model to summarize an input of 2K tokens of input text to an output of 1K tokens.
Total hourly cost incurred is = 2K tokens/1000 * $0.0003 + 1K tokens/1000 * $0.0004 = $0.001.
An application developer makes the following API calls to Amazon Bedrock: a request to the Amazon Titan Image Generator base model to generate 1000 images of 1024 x 1024 in size of standard quality.
Total cost incurred = 1000 images * $0.01 per image = $10
Customization (fine-tuning and continued pretraining) pricing
An application developer customizes an Amazon Titan Image Generator model using 1000 image-text pairs. After training, the developer uses custom model provisioned throughput for 1 hour to evaluate the performance of the model. The fine-tuned model is stored for 1 month. After evaluation, the developer uses provisioned throughput (1-month commitment term) to host the customized model.
Monthly cost incurred for fine-tuning = fine-tuning training ($.005 * 500 * 64), where $0.005 is the price per image seen, 500 is the number of steps, and 64 is the batch size, + custom model storage per month ($1.95) + 1 hour of custom model inference ($21) = $160 + $1.95 + 21 = $182.95
Provisioned Throughput pricing
An application developer buys two model units of Amazon Titan Text Express with a 1-month commitment for their text summarization use case.
Total monthly cost incurred = 2 model units * $18.40/hour * 24 hours * 31 days = $27,379.20
An application developer buys one model unit of the base Amazon Titan Image Generator model with a 1-month commitment.
Total cost incurred = 1 model unit * $16.20 * 24 hours * 31 days = $12,052.80
-
Anthropic
On-Demand pricing
An application developer makes the following API calls to Amazon Bedrock in the US West (Oregon) Region: a request to Anthropic’s Claude model to summarize an input of 11K tokens of input text to an output of 4K tokens.
Total cost incurred = 11K tokens/1000 * $0.008 + 4K tokens/1000 * $0.024 = $0.088 + $0.096 = $0.184
Provisioned Throughput pricing
An application developer buys one model unit of Anthropic Claude Instant in the US West (Oregon) Region:
Total monthly cost incurred = 1 model unit * $39.60 * 24 hours * 31 days = $29,462.40
-
Cohere
On-Demand pricing
An application developer makes the following API calls to Amazon Bedrock: a request to Cohere’s Command model to summarize an input of 6K tokens of input text to an output of 2K tokens.
Total cost incurred = 6K tokens/1,000 * $0.0015 + 2K tokens/1,000 * $0.0020 = $0.013
An application developer makes the following API calls to Amazon Bedrock: A request to Cohere’s Command - Light model to summarize an input of 6K tokens of input text to an output of 2K tokens.
Total cost incurred = 6K tokens/1000 * $0.0003 + 2K tokens/1000 * $0.0006 = $0.003
An application developer makes the following API calls to Amazon Bedrock: A request to either Cohere’s Embed English or Embed Multilingual model to generate embeddings for 10K tokens of input.
Total cost incurred = 10K tokens/1000 * $0.0001 = $.001
Customization (fine-tuning) pricing
An application developer customizes a Cohere Command model using 1000 tokens of data. After training, uses custom model provisioned throughput for 1 hour to evaluate the performance of the model. The fine-tuned model is stored for 1 month. After evaluation, the developer uses provisioned throughput (1mo commit) to host the customized model.
Monthly cost incurred for fine-tuning = Fine-tuning training ($0.004 * 1000) + custom model storage per month ($1.95) + 1 hour of custom model inference ($49.50) = $55.45
Monthly cost incurred for provisioned throughput (1-month commitment) of custom model = $39.60
Provisioned Throughput pricing
An application developer, buys one model unit of Cohere Command with a 1-month commitment for their text summarization use case.
Total monthly cost incurred = 1 model unit * $39.60 * 24 hours * 31 days = $29,462.40
-
Meta Llama
On-Demand pricing
An application developer makes the following API calls to Amazon Bedrock: a request to Meta’s Llama 2 Chat (13B) model to summarize an input of 2K tokens of input text to an output of 500 tokens.
Total cost incurred = 2K tokens/1000 * $0.00075 + 500 tokens/1000 * $0.001 = $0.002
Customization (fine-tuning) pricing
An application developer customizes the Llama 2 Pretrained (70B) model using 1000 tokens of data. After training, uses custom model provisioned throughput for 1 hour to evaluate the performance of the model. The fine-tuned model is stored for 1 month. After evaluation, the developer uses provisioned throughput (1mo commit) to host the customized model.
Monthly cost incurred for fine-tuning = Fine tuning training ($0.00799 * 1000) + custom model storage per month ($1.95) + 1 hour of custom model inference ($23.50) = $33.44
Monthly cost incurred for provisioned throughput (a 1-month commit) of custom model = $21.18
Provisioned Throughput pricing
An application developer buys one model unit of Meta Llama 2 with a 1-month commitment for their text summarization use case.
Total monthly cost incurred = 1 model unit * $21.18 * 24 hours * 31 days = $15,757.92
-
Mistral AI
On-Demand pricing
An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Mistral 7B model to summarize an input of 2K tokens of input text to an output of 1K tokens.
Total hourly cost incurred = 2K tokens/1000 * $0.00015 + 1K tokens/1000 * $0.0002 = $0.0005
An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Mixtral 8x7B model to summarize an input of 2K tokens of input text to an output of 1K tokens.
Total hourly cost incurred = 2K tokens/1000 * $0.00045 + 1K tokens/1000 * $0.0007 = $0.0016
An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Mistral Large model to summarize an input of 2K tokens of input text to an output of 1K tokens.
Total hourly cost incurred = 2K tokens/1000 * $0.008 + 1K tokens/1000 * $0.024 = $0.04
-
Stability AI
On-Demand pricing
An application developer makes the following API calls to Amazon Bedrock: a request to the SDXL model to generate a 512 x 512 image with a step size of 70 (premium quality).
Total cost incurred = 1 image * $0.036 per image = $0.036
An application developer makes the following API calls to Amazon Bedrock: A request to the SDXL 1.0 model to generate a 1024 x 1024 image with a step size of 70 (premium quality).
Total cost incurred = 1 image * $0.08 per image = $0.08
Provisioned Throughput pricing
An application developer buys one model unit of SDXL 1.0 with a 1-month commitment.
Total cost incurred = 1 * $49.86 * 24 hours * 31 days = $37,095.84
-
Model evaluation
Model evaluation example 1:
On-demand pricing
An application developer submits a dataset for human-based model evaluation using Anthropic Claude 2.1 and Anthropic Claude Instant in the US East (N. Virginia) AWS Region.The dataset contains 50 prompts, and the developer requires one worker to rate each prompt-response set (configurable in the evaluation job creation as “workers per prompt” parameter).
There will be 50 tasks in this evaluation job (one task for each prompt-response set per each worker). The 50 prompts combine to 5000 input tokens, and the associated responses combine to 15,000 tokens for Anthropic Claude Instant and 20,000 tokens for Anthropic Claude 2.1.
The following charges are incurred for this model evaluation job:Item Number of input tokens Price per 1000 input tokens Cost of input Number of output tokens Price per 1000 output tokens Cost of output Number of human tasks Price per human task Cost of human tasks Total Claude Instant Inference 5000 $0.0008 $0.004 15000 $0.0024 $0.036 $0.04 Claude 2.1 Inference 5000 $0.008 $0.04 20000 $0.024 $0.48 $0.52 Human Tasks 50 $0.21 $10.50 $10.50 Total $11.06 Model evaluation example 2:
On-demand pricing
An application developer submits a dataset for human-based model evaluation using Anthropic Claude 2.1 and Anthropic Claude Instant in the US East (N. Virginia) AWS Region.
The dataset contains 50 prompts, and the developer requires two workers to rate each prompt-response set (configurable in the evaluation job creation as “workers per prompt” parameter). There will be 100 tasks in this evaluation job (1 task for each prompt-response set per each worker: 2 workers x 50 prompt-response sets = 100 human tasks).
The 50 prompts combine to 5000 input tokens, and the associated responses combine to 15000 tokens for Anthropic Claude Instant and 20000 tokens for Anthropic Claude 2.1.
The following charges are incurred for this model evaluation job:
Item Number of input tokens Price per 1000 input tokens Cost of input Number of output tokens Price per 1000 output tokens Cost of output Number of human tasks Price per human task Cost of human tasks Total Claude Instant Inference 5000 $0.0008 $0.0040 15000 $0.0024 $0.036 $0.04 Claude 2.1 Inference 5000 $0.008 $0.0400 20000 $0.024 $0.48 $0.52 Human Tasks 100 $0.21 $21.00 $21.00 Total $21.56 -
Amazon Bedrock Guardrails
Example 1: Customer support chatbot
An application developer creates a customer support chatbot and uses content filters to block harmful content and denied topics to filter undesirable queries and responses.
The chatbot serves 1000 user queries per hour. Each user query has an average input length of 200 characters and receives a FM response of 1500 characters.
Each user query of 200 characters correspond to 1 text unit.
Each FM response of 1,500 characters correspond to 2 text units.
Text units processed each hour = (1 + 2) * 1000 queries = 3000 text units
Total cost incurred per hour for content filters and denied topic = 3000 * ($0.75 + $1.00) / 1000 = $5.25Example 2: Call center transcript summarization
An application developer creates an application to summarize chat transcripts between users and support agents. It uses sensitive information filter to redact personally identifiable information (PII) in the generated summaries for 10,000 conversations.
Each generated summary has an average of 3,500 characters that corresponds to 4 text units.
Total cost incurred to summarize 10,000 conversations = 10000 * 4 * ($0.1/1000) = $4Item Number of input tokens Price per 1000 input tokens Cost of input Number of output tokens Price per 1000 output tokens Cost of output Number of human tasks Price per human task Cost of human tasks Total Claude Instant Inference 5000 $0.0008 $0.004 15000 $0.0024 $0.036 $0.04 Claude 2.1 Inference 5000 $0.008 $0.04 20000 $0.024 $0.48 $0.52 Human Tasks 100 $0.21 $21.00 $21.00 Total $21.56 -
Custom Model Import
Pricing Example: An application developer imports a customized Llama 3.1 type model that is 8B parameter in size with a 128K sequence length in us-east-1 region and deletes the model after 1 month. This requires 2 Custom Model Units. So, the price per minute will be $0.1570 because 2 Custom Model Units are required. The model storage costs for 2 Custom Model Units would be $3.90 for the month.
There is no charge to import the model. The first successful invocation is at 8:03 AM, at which time the metering starts. The 5-minute metering windows are from 8:03 AM - 8:07 AM; 8:07 AM - 8:11 AM, and so on. If there is at least one invocation during any 5-minute period, the window will be considered active for billing. If there is no invocation from 8:07 AM - 8:11 AM, the metering will stop at 8:11 AM. In this case, the bill would be calculated as follows: $0.1570 * 5 minutes * 3 five minute windows = $2.355.
-
Amazon Bedrock Knowledge Bases
Pricing Example 1 (Reranking using Amazon Rerank 1.0 model)
In a given month, you make 2 million requests to Rerank API using Amazon Rerank 1.0 model – 1 million requests contain fewer than 100 documents each and hence will be charged for one request each. The remaining 1 million requests contain 120-150 documents, and hence each request will be charged for 2 requests.
Price for one request = $0.001
Total charge = 1,000,000 * $0.001 + 1,000,000*2*$0.001= $3000 -
Flows
Example: News summarization
An application developer creates a flow to automate news summarization for traders. The flow includes an Input node that takes in an S3 location, and a S3 retrieval node that retrieves 10 files that include articles from 10 major news agency in S3 (2 node transitions). It then uses an iterator node to invoke a model with a prompt node to summarize each file (+ 10 files x 2 node transitions). It then collects all the results using a collector node, write the results to S3 using S3 storage node, and complete in an Output node (+ 3 node transition). They run this flow every half hour of every week day.The number of node transition per flow execution is: 2+1+10*2 + 3 = 25 node transitions/flow execution
The number of flow execution per month is: 24 hours *2* 5 days * 4 weeks = 960 flow executions/month.
Total monthly bill is: 25 * 960 * $0.035/1000 = $0.84
Additional charges
The bill will also include additional charges for AWS services used in the workflow execution, including Amazon S3 usages in the retrieval and storage nodes, and Amazon Bedrock foundation model usage in the prompt node.