AWS Public Sector Blog
Germany’s International University of Applied Sciences automates creation of educational videos using generative AI, serverless on AWS
With more than 130,000 students, the International University of Applied Sciences (IU) is the largest university in Germany. IU maintains 90 percent of its course content online, with their primary student population consisting of working adults. Through its online programs, IU aims to give people worldwide access to highly individualized education, enabling them to further enrich their lives. Dr. Sven Schütt, CEO of IU, is a thought leader in the field of artificial intelligence (AI), with a vision to democratize education worldwide.
Today, 90 percent of IU’s infrastructure runs on Amazon Web Services (AWS). In order to meet their growth goals, IU worked directly with AWS experts through the Experience-Based Acceleration (EBA) program to expand their automated video generation pipelines to be more scalable, modular, and robust. The EBA program also gave IU the opportunity to experiment with AWS serverless and generative artificial intelligence (AI) offerings to further enhance and automate the development of educational content.
Context
IU wants to provide a modern education experience that delivers course content tailored to each student’s needs. However, such an experience requires an unconventionally large set of educational materials to serve different content to different users. As part of this experience, IU offers their students targeted learning videos summarizing different course content based on information captured in course books.
To produce these videos, the IU content delivery team initially worked with multiple human content designers and academic staff, including university professors, to summarize books, identify important sections, and create relevant images. Then, the summary, additional text, images, and diagrams were integrated into a video. Throughout the process multiple verifications and manual corrections ensured, the content was accurate and coherent.
Even though the team was able to create high-quality video material this way, they quickly realized that the process was very time-consuming and resource-intensive. To provide a broad range of content on specific topics tailored to students’ learning objectives, the IU team needed to generate at least 24,000 videos for all programs in 2024. The team attempted to automate the video generation process by developing prototypes (for example, by using popular large language model (LLM) offerings to generate summaries), but they found that the endpoints were not reliable or responsive enough to handle steady demand. A semiautomated pipeline they developed was unstable, failing a significant amount of the time and requiring a full restart of the text generation process.
To be able to produce the required number of videos in 2024, IU required a more robust and automated solution that minimizes manual steps and relies on a scalable, cost-optimized architecture.
In order to jump-start their development, IU and AWS leveraged the EBA program. EBA is a transformation methodology using hands-on, agile, and immersive engagements to speed up cloud migration and modernization. Hundreds of enterprises at various levels of cloud maturity have harnessed EBA to build cloud foundations, migrate at scale, modernize their businesses, and innovate. Together with IU, AWS planned an eight-week program that included requirements gathering, sprint planning, training, design work, and technology evaluation. At the end of the eight weeks, the IU and AWS teams met in person for a three-day “EBA party,” where they built an automated, end-to-end pipeline including all critical components of IU’s video generation operation.
Solution
A few sprints after the program started, IU and AWS came up with a scalable and resilient pipeline design that automates video generation using generative AI. The solution uses LLMs to generate summaries and extract additional textual information (such as bullet points) from course books. In the next stage, appropriate images are generated for the extracted bullet points using generative AI and embedded to slides. In the final stages, the pipeline uses the summary to generate the avatar video and embed the slides containing the bullet points and images to the generated video.
Considering the rapid developments in the generative AI space, the IU team aimed to build a modular system that can support LLMs through a consistent API endpoint. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies along with a broad set of capabilities needed to build generative AI applications. All FMs supported by Amazon Bedrock are accessible through a REST API. To give IU full flexibility and modularity for text generation, the pipeline was extended to support multiple endpoints, including Amazon Bedrock, Amazon SageMaker, and those on third-party providers.
On top of the modularity provided for text generation, the pipeline relies on Stability AI’s Stable Diffusion XL (SDXL) 1.0 hosted in Amazon Bedrock for generating images. Finally, for generating videos based on summaries, the solution uses video generation tools such as Synthesia or Colossyan.
To deal with challenging scaling requirements and to reduce costs, the team came up with a serverless architecture. The serverless services scale automatically as traffic and usage demands change, saving costs and handling sudden spikes in traffic without performance issues.
The main aspects of the workflow are described in the architecture diagrams in figures 2 and 3.
1. Books are uploaded to an Amazon Simple Storage Service (Amazon S3)
2. When a book is stored in an S3 bucket, an AWS Lambda function is invoked to split the books into chapters and to store them in another S3 bucket, which notifies Amazon Simple Queue Service (Amazon SQS) about the events.
3. After polling a message from the chapters queue, the Generate Summary for each Chapter Lambda functions read the LLM endpoint and the corresponding prompt from an Amazon DynamoDB table and calls the LLM endpoint to generate the summaries. The summaries are stored in the summaries bucket.
4. Each entry in the summaries bucket publishes an event to an Amazon EventBridge event bus.
In the second part of the workflow shown in Figure 3, the pipeline goes forward with the generation of videos.
5. EventBridge is used to fan out the summaries to two different queues for generating relevant images and videos. This allows the system to work in parallel for generating slides as in step 6, and videos as in step 7.
6. The system generates slides following these steps:
a. Using the same approach (the respective DynamoDB table is left out of the diagram for simplicity), relevant bullet points are generated for the respective summary and stored in the bullet points queue.
b. The bullet points are used to generate images through SDXL 1.0 hosted in Amazon Bedrock. The generated images and bullet points are embedded in a single slide and stored in the slides bucket.
7. The system generates videos and identifies time frames of topics following these steps:
a. Using third-party video generation endpoints, an avatar video is generated for the respective summary and stored in an S3 bucket.
b. Amazon Transcribe is used to transcribe the speech of the avatar. This way, the time frames for different talking points can be identified. The results are stored in an S3 bucket.
8. Finally, the Embed Slides to Videos Lambda function uses videos, their respective transcriptions, and slides to embed the slides to the videos at the correct time frame of the topic.
When accessing services, it’s important to consider throughput limitations, which are the number of requests allowed per minute (RPM) or per second (QPS). Here, we will look more closely at how the system deals with the throughput limitations of different generative AI models, model providers, and third-party video generation services.
To give an example, let’s assume the system used Anthropic’s Claude 3 Sonnet model for generating summaries. Figure 4 focuses on the respective part of the architecture.
Using the basic concurrency calculation and setting reserved concurrency configurations accordingly, Lambda functions can be configured to make requests without going over the RPM limits of each endpoint.
At the time of writing, the Anthropic Claude 3 Sonnet model hosted in Amazon Bedrock supports 500 requests per minute (RPM) on demand, and the RPM can be increased by purchasing provisioned throughput. Assuming that a request took 1 minute on average, the system can set the concurrency limit to 500 to avoid overloading the on-demand endpoint.
Even with the system being proactively careful like this, the endpoints might be unable to keep their service level agreement (SLA) promises. Theoretically, the reasons might vary from complete unavailability due to intermittent events or a high number of combined concurrent requests.
For example, a third-party API endpoint might state 50 RPM in the documentation, but this doesn’t capture how many concurrent requests can be processed. Therefore, for applications that use third-party APIs and that scale dynamically based on user requests, it’s important to implement a backoff-and-jitter strategy in addition to respecting the API’s RPM limit. Note that in the case of using AWS services, you can rely on the AWS SDKs because most AWS SDKs support backoff and jitter, including the SDK for Python (Boto3) used in the project.
The EBA engagement combined the strengths of IU group and AWS together. “In the EBA program, AWS and IU employed state-of-the-art infrastructure to scale video production, blending high-tech solutions with educational excellence,” said Dr. Schütt.
To achieve IU’s goal of increasing the system’s overall modularity by being able to evaluate and potentially change FMs in the future, the team created an LLM prompt catalog during the EBA party. A prompt catalog is a central repository for storing, versioning, and sharing prompts used for querying LLMs. IU’s prompt catalog includes the FM used, the FM version, the prompt version, the prompt text, and sample inference output from the LLM. With this design, the best-performing model and prompt combination can be promoted to test environments, and eventually production environments (by propagating it to the endpoint configuration table in the diagram). It can ultimately be used by the video generation pipeline. Keeping a prompt catalog in this manner will enable IU to extend their LLM operations in the future to include both human-in-the-loop and automated FM evaluation, following the AWS FMOps/LLMOps guidance.
Results
By working directly with AWS experts through the AWS EBA program, IU was able to gain valuable knowledge about key processes and technologies that will allow them to scale through 2024 and beyond. Using AWS serverless technologies, IU will be able to reduce overall costs, scale up its video generation processes to meet growing demand, and ensure that their system is more robust and resistant to faults while interacting with third-party services. In addition, the development of a prompt catalog will allow IU to efficiently evaluate new prompts and LLMs in the future, as generative AI models and capabilities continue to evolve. This will allow IU to further meet its goals around modularity.
To learn more about IU and its work with AWS, you can read how IU is securing data and advancing sustainability on AWS. If you’re interested in exploring the AWS EBA program for your own institution, you can connect with an EBA expert on the EBA site.