Amazon Bedrock prompt caching
Overview
Many foundation model (FM) use cases will reuse certain portions of prompts (prefixes) across API calls. With prompt caching, supported models will let you cache these repeated prompt prefixes between requests. This cache lets the model skip recomputation of matching prefixes. As a result, prompt caching in Amazon Bedrock can reduce costs by up to 90% and latency by up to 85% for supported models.
Improve performance for multiple use cases
Many applications either require or benefit from long prompts, such as document Q&A, code assistants, agentic search, or long-form chat. Even with the most intelligent foundation models, you often need to use extensive prompts with detailed instructions with many-shot examples to achieve the right results for your use case. However, long prompts, reused across API calls, can lead to increased average latency. With prompt caching, internal model state does not need to be recomputed if the prompt prefix is already cached. This saves processing time, resulting in lower response latencies.
Reduce cost associated with long, repeated prompts
With prompt caching, you can cache the relevant portions of your prompt to save on input token costs. Your cache is specific to your account and comprises the internal model state representing your prompts. Because the model can skip recomputation for cached prefixes, compute resources required to process your requests decreases. As a result, your costs are reduced.
Seamlessly Integrate with other Amazon Bedrock features
Prompt caching integrates with Amazon Bedrock features like Agents, allowing you to accelerate multi-step tasks and even take advantage of longer system prompts to help refine agent behavior without slowing your responses down.