AWS Machine Learning Blog

Orchestrate generative AI workflows with Amazon Bedrock and AWS Step Functions

Companies across all industries are harnessing the power of generative AI to address various use cases. Cloud providers have recognized the need to offer model inference through an API call, significantly streamlining the implementation of AI within applications. Although a single API call can address simple use cases, more complex ones may necessitate the use of multiple calls and integrations with other services.

This post discusses how to use AWS Step Functions to efficiently coordinate multi-step generative AI workflows, such as parallelizing API calls to Amazon Bedrock to quickly gather answers to lists of submitted questions. We also touch on the usage of Retrieval Augmented Generation (RAG) to optimize outputs and provide an extra layer of precision, as well as other possible integrations through Step Functions.

Introduction to Amazon Bedrock and Step Functions

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can easily experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock is serverless, you don’t have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

AWS Step Functions is a fully managed service that makes it easier to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function helps you scale more easily and change applications more quickly. Step Functions is a reliable way to coordinate components and step through the functions of your application. Step Functions provides a graphical console to arrange and visualize the components of your application as a series of steps. This makes it easier to build and run multi-step applications. Step Functions automatically triggers and tracks each step and retries when there are errors, so your application executes in order and as expected. Step Functions logs the state of each step, so when things do go wrong, you can diagnose and debug problems more quickly. You can change and add steps without even writing code, so you can more easily evolve your application and innovate faster.

Orchestrating parallel tasks using the map functionality

Arrays are fundamental data structures in programming, consisting of ordered collections of elements. In the context of Step Functions, arrays play a crucial role in enabling parallel processing and efficient task orchestration. The map functionality in Step Functions uses arrays to execute multiple tasks concurrently, significantly improving performance and scalability for workflows that involve repetitive operations. Step Functions provides two different mapping strategies for iterating through arrays: inline mapping and distributed mapping, each with its own advantages and use cases.

Inline mapping

The inline map functionality allows you to perform parallel processing of array elements within a single Step Functions state machine execution. This approach is suitable when you have a relatively small number of items to process and when the processing of each item is independent of the others.
Here’s how it works:

  1. You define a Map state in your Step Functions state machine.
  2. Step Functions iterates over the array and runs the specified tasks for each element concurrently.
  3. The results of each iteration are collected and made available for subsequent steps in the state machine.

Inline mapping is efficient for lightweight tasks and helps avoid launching multiple Step Functions executions, which can be more costly and resource intensive. But there are limitations. When using inline mapping, only JSON payloads can be accepted as input, your workflow’s execution history can’t exceed 25,000 entries, and you can’t run more than 40 concurrent map iterations.

Distributed mapping

The distributed map functionality is designed for scenarios where many items need to be processed or when the processing of each item is resource intensive or time-consuming. Instead of handling all items within a single execution, Step Functions launches a separate execution for each item in the array, letting you concurrently process large-scale data sources stored in Amazon Simple Storage Service (Amazon S3), such as a single JSON or CSV file containing large amounts of data, or even a large set of Amazon S3 objects. This approach offers the following advantages:

  • Scalability – By distributing the processing across multiple executions, you can scale more efficiently and take advantage of the built-in parallelism in Step Functions
  • Fault isolation – If one execution fails, it doesn’t affect the others, providing better fault tolerance and reliability
  • Resource management – Each execution can be allocated its own resources, helping prevent resource contention and providing consistent performance

However, distributed mapping can incur additional costs due to the overhead of launching multiple Step Functions executions.

Choosing a mapping approach

In summary, inline mapping is suitable for lightweight tasks with a relatively small number of items, whereas distributed mapping is better suited for resource-intensive tasks or large datasets that require better scalability and fault isolation. The choice between the two mapping strategies depends on the specific requirements of your application, such as the number of items, the complexity of processing, and the desired level of parallelism and fault tolerance.

Another important consideration when building generative AI applications using Amazon Bedrock and Step Functions Map states together would be the Amazon Bedrock runtime quotas. Generally, these model quotas allow for hundreds or even thousands of requests per minute. However, you may run into issues trying to run a large map on models with low requests processed per minute quotas, such as image generation models. In that scenario, you can include a retrier in the error handling of your Map state.

Solution overview

In the following sections, we get hands-on to see how this solution works. Amazon Bedrock has a variety of model choices to address specific needs of individual use cases. For the purposes of this exercise, we use Amazon Bedrock to run inference on Anthropic’s Claude 3.5 Haiku model to receive answers to an array of questions because it’s a performant, fast, and cost-effective option.

Our goal is to create an express state machine in Step Functions using the inline Map state to parse through the JSON array of questions sent by an API call from an application. For each question, Step Functions will scale out horizontally, creating a simultaneous call to Amazon Bedrock. After all the answers come back, Step Functions will concatenate them into a single response, which our original calling application can then use for further processing or displaying to end-users.

The payload we send consists of an array of nine Request for Proposal (RFP) questions, as well as a company description:

{
  "questions": [
    "Can you describe your technical capabilities and infrastructure?",
    "What security measures do you have in place to protect data and privacy?",
    "Can you provide case studies or examples of similar projects you have handled?",
    "How do you handle project management, and what tools do you use?",
    "What are your support and maintenance services like?",
    "What is your pricing model?",
    "Can you provide references from other clients?",
    "How do you ensure the scalability of your solution?",
    "What is your approach to data backup and recovery?"
  ],
  "description": "Our company, AnyCompany Tech, boasts a robust technical infrastructure that allows us to handle complex projects with ease. Our strength lies in our dynamic team of experts and our cutting-edge technology, which, when combined, can deliver solutions of any scale. We've worked with clients across the globe, for instance, our project with Example Corp involved a sophisticated upgrade of their system. In terms of security, we prioritize data privacy and have put in place stringent measures to ensure that all data is stored securely. We're quite proud of our project with AnyCompany Networks, where we overhauled their security systems to bolster their data protection capabilities. We use a range of project management tools, including Product-1 and Product-2, which allows us to customize our approach to each client's needs. Our pricing model varies depending on the project, but we always aim to provide cost-effective solutions. We've had numerous positive feedback from our clients, with Example Corp and AnyCompany Networks among those who have expressed satisfaction with our services. We're more than happy to provide further references upon request. Software updates and upgrades are a critical part of our service. We have a dedicated team that ensures all systems are up-to-date and running smoothly. Furthermore, our solutions are designed to be scalable, ensuring that they can grow alongside your business. Lastly, in terms of data backup and recovery, we have a comprehensive plan in place, which includes regular data backups and a robust recovery strategy. We understand the importance of data in today's world and we're committed to ensuring its safety and accessibility at all times."
}

You can use the step-by-step guide in this post or use the prebuilt AWS CloudFormation template in the us-west-2 Region to provision the necessary AWS resources. AWS CloudFormation gives developers and businesses a straightforward way to create a collection of related AWS and third-party resources, and provision and manage them in an orderly and predictable fashion.

Prerequisites

You need the following prerequisites to follow along with this solution implementation:

Create a State Machine and add a Map state

In the AWS console in the us-west-2 Region, launch into Step Functions, and select Get started and Create your own to open a blank canvas in Step Functions Workflow Studio.

Edit the state machine by adding an inline Map state with items sourced from a JSON payload.

Next, tell the Map state where the array of questions is located by selecting Provide a path to items array and pointing it to the questions array using JSONPath syntax. Selecting Modify items with ItemSelector allows you to structure the payload, which is then sent to each of the child workflow executions. Here, we map the description through with no change and use $$.Map.Item.Value to map the question from the array at the index of the map iteration.

Invoke an Amazon Bedrock model

Next, add a Bedrock: InvokeModel action task as the next state within the Map state.

Now you can structure your Amazon Bedrock API calls through Workflow Studio. Because we’re using Anthropic’s Claude 3.5 Haiku model on Amazon Bedrock, we select the corresponding model ID for Bedrock model identifier and edit the provided sample with instructions to incorporate the incoming payload. Depending on which model you select, the payload may have a different structure and prompt syntax.

Build the payload

The prompt you build uses the Amazon State Language intrinsic function States.Format in order to do string interpolation, substituting {} for the variables declared after the string. We must also include .$ after our text key to reference a node in this state’s JSON input.

When building out this prompt, you should be very prescriptive in asking the model to do the following:

  • Answer the questions thoroughly using the following description
  • Not repeat the question
  • Only respond with the answer to the question

We set the max_tokens to 800 to allow for longer responses from Amazon Bedrock. Additionally, you can include other inference parameters such as temperature, top_p, top_k, and stop_sequences. Tuning these parameters can help limit the length or influence the randomness or diversity of the model’s response. For the sake of this example, we keep all other optional parameters as default.

{
  "anthropic_version": "bedrock-2023-05-31",
  "max_tokens": 800,
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text.$": "States.Format('Answer following question thoroughly, using the following description. Do not repeat the question. Only respond with the answer to the question. Question: {} Description: {}', $.questions.question, $.description)"
        }
      ]
    }
  ]
}

Form the response

To provide a cleaner response back to our calling application, we want to use some options to transform the output of the Amazon Bedrock Task state. First, use ResultSelector to filter the response coming back from the service to pull out the text completion, then add the original input back to the output using ResultPath and finish by filtering the final output using OutputPath. That way you don’t have to see the description being mapped unnecessarily for each array item.

To simulate the state machine being called by an API, choose Execute in Workflow Studio. Using the preceding input, the Step Functions output should look like the following code, although it may vary slightly due to the diversity and randomness of FMs:

{
  "output": [
    {
      "question": "Can you describe your technical capabilities and infrastructure?",
      "response": {
        "completion": "AnyCompany Tech offers a comprehensive technical infrastructure powered by a dynamic expert team and cutting-edge technology capable of handling complex projects at any scale. The company prioritizes data security through stringent protection measures, uses advanced project management tools like Product-1 and Product-2, and offers customizable solutions. They provide global client support with a focus on scalability, software updates, and robust data backup and recovery strategies. Their approach emphasizes cost-effectiveness, client satisfaction, and adaptable technological solutions that can grow with a client's business needs. The team maintains a dedicated focus on keeping systems updated, secure, and responsive to evolving technological requirements."
      }
    },
    {
      "question": "What security measures do you have in place to protect data and privacy?",
      "response": {
        "completion": "At AnyCompany Tech, we have implemented stringent security measures to protect data and privacy. These include a comprehensive data backup and recovery strategy with regular backups, robust security systems that have been successfully deployed for clients like AnyCompany Networks, and dedicated teams ensuring systems are up-to-date. We prioritize data privacy and use advanced technical infrastructure and cutting-edge technology to secure client information. Our approach involves customizable security solutions, scalable systems, and a commitment to protecting data accessibility and safety at all times."
      }
    },
    {
      "question": "Can you provide case studies or examples of similar projects you have handled?",
      "response": {
        "completion": "Based on the description provided, we have successfully completed projects for Example Corp, involving a sophisticated system upgrade, and AnyCompany Networks, where we overhauled their security systems. These case studies demonstrate our ability to handle complex technical infrastructure projects across different industries. Our team has worked with global clients, delivering solutions that prioritize security, data privacy, and scalability. While specific technical details aren't elaborated, our track record shows proven expertise in system improvements and security enhancements. We have received positive client feedback from both Example Corp and AnyCompany Networks, and we are prepared to provide additional references upon request."
      }
    },
    {
      "question": "How do you handle project management, and what tools do you use?",
      "response": {
        "completion": "At AnyCompany Tech, we utilize Product-1 and Product-2 as our primary project management tools, allowing us to create customized approaches for each client's unique needs. Our project management strategy is built on our dynamic team of experts and cutting-edge technology, which enables us to handle complex projects efficiently. We prioritize adaptability, scalability, and client-specific solutions, ensuring that our project management approach can accommodate projects of any scale. Our tools and processes are designed to provide transparent tracking, seamless communication, and effective resource allocation, which allows us to deliver high-quality results consistently. Additionally, we have dedicated teams for software updates, system maintenance, and data backup/recovery to support our robust project management infrastructure."
      }
    },
    {
      "question": "What are your support and maintenance services like?",
      "response": {
        "completion": "Our support and maintenance services are comprehensive and tailored to each client's unique needs. We provide a dedicated team that ensures all systems are up-to-date and running smoothly, with regular software updates and upgrades. Our solutions are designed to be scalable, allowing them to grow alongside your business. We have a robust data backup and recovery strategy that includes regular backups and a comprehensive recovery plan to ensure data safety and accessibility. Our technical infrastructure, combined with our expert team, allows us to deliver high-quality, secure, and efficient maintenance services. We use advanced project management tools like Product-1 and Product-2 to customize our approach, and we prioritize data privacy with stringent security measures. Our goal is to provide cost-effective solutions that meet and exceed client expectations, with a track record of successful projects and satisfied clients like Example Corp and AnyCompany Networks."
      }
    },
    {
      "question": "What is your pricing model?",
      "response": {
        "completion": "Our pricing model is flexible and tailored to each specific project. We aim to provide cost-effective solutions that are customized to meet the unique needs of our clients. The exact pricing depends on the scope, complexity, and requirements of the project, ensuring that we deliver value while addressing the specific challenges of each client's technological needs."
      }
    },
    {
      "question": "Can you provide references from other clients?",
      "response": {
        "completion": "Yes, we can provide references from clients like Example Corp and AnyCompany Networks, who have previously worked with us and expressed satisfaction with our services. We are more than happy to connect you with these references upon request, allowing you to gain direct insights into our capabilities, project performance, and customer satisfaction."
      }
    },
    {
      "question": "How do you ensure the scalability of your solution?",
      "response": {
        "completion": "We ensure scalability through a multi-faceted approach. Our solutions are intentionally designed to be flexible and adaptable, capable of growing alongside our clients' businesses. We leverage cutting-edge technology and our dynamic team of experts to create infrastructure that can handle increasingly complex projects. Our robust technical infrastructure allows us to seamlessly expand systems, and we use advanced project management tools like Product-1 and Product-2 to customize solutions that can scale effectively. Additionally, we have a dedicated team focused on software updates and upgrades, which helps maintain system performance and enables smooth expansion. Our comprehensive approach to scalability is further supported by our data backup and recovery strategies, ensuring system reliability and continuity as businesses grow. By prioritizing adaptability and having the technical expertise to implement scalable solutions, we can confidently meet the evolving needs of our clients across various industries."
      }
    },
    {
      "question": "What is your approach to data backup and recovery?",
      "response": {
        "completion": "Our data backup and recovery approach encompasses regular, comprehensive data backups and a robust recovery strategy. We understand the critical importance of data protection and have implemented a systematic process to ensure data safety and accessibility at all times. Our strategy involves creating multiple backup copies, storing them in secure, redundant locations, and maintaining a well-defined recovery protocol that allows for quick restoration of data in the event of any potential loss or system failure. This approach is designed to minimize downtime and protect our clients' valuable information, reflecting our commitment to maintaining the highest standards of data security and reliability."
      }
    }
  ],
  "outputDetails": {
    "truncated": false
  }
}

Clean up resources

To delete this solution, navigate to the State machines page on the Step Functions console, select your state machine, choose Delete, and enter delete to confirm. It will be marked for deletion and will be deleted when all executions are stopped.

RAG and other possible integrations

RAG is a strategy that enhances the output of a large language model (LLM) by allowing it to reference an authoritative external knowledge base, generating more accurate or secure responses. This powerful tool can extend the capabilities of LLMs to specific domains or an organization’s internal knowledge base without needing to retrain or even fine-tune the model.

A straightforward way to integrate RAG into the preceding RFP example is by adding a Bedrock Runtime Agents: Retrieve action task to your Map state before invoking the model. This enables queries to Amazon Bedrock Knowledge Bases, which supports various vector storage databases, including the Amazon OpenSearch Serverless vector engine, Pinecone, Redis Enterprise Cloud, and soon Amazon Aurora and MongoDB. Using Knowledge Bases to ingest and vectorize example RFPs and documents stored in Amazon S3 eliminates the need to include a description with the question array. Also, because a vector store can accommodate a broader range of information than a single prompt is able to, RAG can greatly enhance the specificity of the responses.

In addition to Amazon Bedrock Knowledge Bases, there are other options to integrate for RAG depending on your existing tech stack, such as directly with an Amazon Kendra Task state or with a vector database of your choosing through third-party APIs using HTTP Task states.

Step Functions offers composability, allowing you to seamlessly integrate over 9,000 AWS API actions from more than 200 services directly into your workflows. These optimized service integrations simplify the use of common services like AWS Lambda, Amazon Elastic Container Service (Amazon ECS), AWS Glue, and Amazon EMR, offering features such as IAM policy generation and the Run A Job (.sync) pattern, which automatically waits for the completion of asynchronous jobs. Another common pattern seen in generative AI applications is chaining models together to accomplish secondary tasks, like language translation after a primary summarization task is completed. This can be accomplished by adding another Bedrock: InvokeModel action task just as we did earlier.

Conclusion

In this post, we demonstrated the power and flexibility of Step Functions for orchestrating parallel calls to Amazon Bedrock. We explored two mapping strategies—inline and distributed—for processing small and large datasets, respectively. Additionally, we delved into a practical use case of answering a list of RFP questions, demonstrating how Step Functions can efficiently scale out and manage multiple Amazon Bedrock calls.

We introduced the concept of RAG as a strategy for enhancing the output of an LLM by referencing an external knowledge base and demonstrated multiple ways to incorporate RAG into Step Functions state machines. We also highlighted the integration capabilities of Step Functions, particularly the ability to invoke over 9,000 AWS API actions from more than 200 services directly from your workflow.

As next steps, explore the possibilities of application patterns offered by the GenAI Quick Start PoCs GitHub repo as well as various Step Functions integrations through sample project templates within Workflow Studio. Also, consider integrating RAG into your workflows to use your organization’s internal knowledge base or specific domain expertise.


About the Author

Dimitri Restaino is a Brooklyn-based AWS Solutions Architect specialized in designing innovative and efficient solutions for healthcare companies, with a focus on the potential applications of AI, blockchain and other promising industry disruptors. Off the clock, he can be found spending time in nature or setting fastest laps in his racing sim.