AWS for M&E Blog
Media content localization with AWS AI services and Amazon Bedrock
This blog post walks through a content localization approach where English is translated to Korean.
In media production, subtitling plays a crucial role in augmenting viewer comprehension by providing transcriptions that facilitate translation from one language to another. This capability has helped expedite the globalization of media content. For example, the popular Netflix show Squid Game became accessible to a broad demographic through effective subtitling. Subtitles are vital in fostering connectedness between viewers and media, bridging linguistic barriers.
This blog post describes two methods to translate subtitles: The sequential process of translation and refinement using artificial intelligence (AI) services from Amazon Web Services (AWS) and prompt engineering with Amazon Bedrock. In addition, we explain the automated process of generating short video clips based on subtitle files.
Introduction and high-level flow
- Amazon Transcribe is a speech-to-text (STT) service from AWS that generates a WebVTT file from a sample MP4 file (What is Amazon Transcribe_.mp4). For the purposes of this blog post, it is not necessary to use the same MP4 file.
- Amazon Translate supports the following file formats for input data: Plain text (.txt); HTML (.html); Microsoft Word (.docx), Excel (.xlsx), PowerPoint (.pptx); and XLIFF 1.2 (.xlf). In this case, WebVTT subtitle files generated from Amazon Transcribe must be converted into plain text (.txt) format for translation.
- As it is not feasible to directly translate SRT(SubRip) or WebVTT files with Amazon Translate, preprocessing steps are implemented to optimize translation outcomes.
- The steps for processing WebVTT subtitle files to enhance visibility are executed on an Amazon SageMaker Studio Notebook instance, as depicted in Figure 1.
- Use Amazon Bedrock to directly translate subtitles with prompt engineering techniques.
- Use Amazon Bedrock to generate concise video clips incorporating the WebVTT file.
Summary of steps
[Subtitle translation with AWS AI services]
- Step 1: Create WebVTT file using Amazon Transcribe
- Step 2: Process the WebVTT file with Amazon SageMaker Studio
- Convert the WebVTT file to a text file and perform basic processing
- Initiate an Amazon Translate job
- Step 3: Preprocess the translated WebVTT file in text format to enhance readability
[Utilizing Amazon Bedrock]
- Step 4: Use Amazon Bedrock to directly translate the WebVTT file through prompt engineering
- Step 5: Delineate a methodology to generate a concise video clip using Amazon Bedrock
Prerequisites
The following prerequisites are needed for this walk through:
- An AWS account
- Amazon SageMaker Studio
- Grant the Execution Role access to services such as Amazon S3, Amazon Polly, and Amazon Bedrock
- A new Amazon S3 bucket
- Amazon Bedrock access to Claude 3
Step 1: Create a WebVTT file using Amazon Transcribe
Start by using Amazon Transcribe to generate a WebVTT file from a sample video file.
- Within the newly created Amazon S3 bucket, create a folder and upload the sample video file to that location. (What is Amazon Transcribe_.mp4)
- Within the Amazon Transcribe service, initiate the ‘Create Job’ process and specify the Input and Output path as depicted in Figure 3. For the Output path, select the ‘Customer specified’ option and designate a new folder to store the generated WebVTT file. In this case, the folder ‘TranscribeVTT’ is designated.
- Retain the default settings for the remaining configuration options and then create the job.
- When the job Status is ‘Complete,’ validate that the WebVTT file has been successfully generated and stored in the designated Amazon S3 location.
Step 2: Process the WebVTT file with Amazon SageMaker Studio
Access the Amazon SageMaker Studio service and instantiate a new notebook instance. Specify ‘Python 3’ as the kernel, retain the default instance type configuration, and proceed with the creation process.
To facilitate an uninterrupted workflow, execute the following code snippet.
The subsequent code snippet facilitates an understanding of the structural composition of the WebVTT file.
The WebVTT file is converted to one of the file formats supported by Amazon Translate, specifically the text (.txt) format in this case. The modified text file is saved as ‘textconverted.txt’ within a newly created directory named ‘txt’.
As shown in Figure 5, if translation is performed directly on the current WebVTT file structure, the meaning could be diluted or incorrectly conveyed. This is a common occurrence due to structural differences in sentence placement across languages. In the example of Figure 6, the Korean text is translated incorrectly.
After the following processing steps, the completed sentences is stored in an S3 bucket.
[Example]
Perform the translation of the text file using Amazon Translate by executing the following statement.
[Example]
Ultimately, you can verify the translation results to ensure no loss of meaning or mistranslations occurred. Next, let’s explore the process of converting this translated text file back into WebVTT format.
Step 3: Preprocess the translated text file in WebVTT format to enhance its readability
While the previous step involved preprocessing the English WebVTT file into a text file for optimal translation, this step focuses on properly placing the translated sentences back into the original WebVTT timeline. This blog post introduces two methods for creating the translated WebVTT file.
Method 1: Tokenize the entire sentence iteratively
The first method involves recognizing sentences in the original English text that end with a period as single sentences. The translated sentence is then repeatedly displayed for the duration of that sentence’s timeline. This allows viewers to understand the context by seeing the translated subtitles while the original text file plays.
[Example]
00:00:01.970 –> 00:00:06.119Transcribing audio can be complex, time consuming and expensive.
00:00:06.329 –> 00:00:10.800You either need to hire someone to do it manually implement applications that are
00:00:10.810 –> 00:00:15.710difficult to maintain or use hard to integrate services that yield poor results.
As depicted in the previous example, text from 06.329 to 15.710 completes a sentence and displays the entire translated sentence during that timeline repeatedly. However, the flow of the sentences may not look natural.
Method 2: Redistribute the entire sentence
The second method involves redistributing sentences as needed to maintain the overall flow and context.
The following are three approaches for redistributing the subtitle file based on different conditions:
- When a single sentence fits into a single timeline, translate the entire subtitle and position the translated sentence accordingly.
- If an entire sentence is divided into multiple subtitle timelines, increase the lag value to adjust the index of the translated sentence to match the timeline. Then, use the ‘divide_sentence()‘ function to split the sentence according to the size of the used subtitle (buffer) and position the subtitles accordingly.
- If a single subtitle contains multiple sentences, increase the lead value to adjust the index of the translated sentences. Then, position the translated multiple sentences within their respective subtitles.
To conduct Method 2, three pieces of data are required:
- Original (English) VTT data
- Original (English) text list
- Translated (Korean) text list (translated using Amazon Translate)
Result example
With this method, the translated webVTT subtitles can be positioned appropriately. Following is an example of a Korean VTT subtitle.
00:00:01.970 –> 00:00:06.119오디오 텍스트 변환은 복잡하고 시간이 많이 걸리며 비용이 많이 들 수 있습니다
00:00:06.329 –> 00:00:10.800유지 관리가 어려운 애플리케이션을 수동으로 구현할 사람을 고용하거나, 통합이
00:00:10.810 –> 00:00:15.710어려운 서비스를 사용하여 결과가 좋지 않을 수 있습니다
Step 4: Use Amazon Bedrock to directly translate the WebVTT file through prompt engineering
As the structure of subjects, objects, and verbs varies across languages, directly translating the original text can lead to subtitles that lack fluency or fail to capture the viewer’s attention. In such cases, long context prompt engineering techniques can be employed to directly translate subtitle files, like WebVTT and SRT.
- Provide large language model (LLM) a persona and conditions that it must adhere to
- Then, put the target subtitle file before the condition and query
- Target data must be in XML tags so it’s clearly separated from the instructions
[Query example]
You are a master of subtitle translation. Below is a webVTT formatted subtitle that you must work on.
<subtitle>
00:00:01.970 –> 00:00:06.119Transcribing audio can be complex, time consuming and expensive. 00:00:06.329 –> 00:00:10.800You either need to hire someone to do it manually implement applications that are
00:00:10.810 –> 00:00:15.710difficult to maintain or use hard to integrate services that yield poor results.
</subtitle>
Condition: 1. Leave the timeline and translate English part to Korean. 2. Make it smooth from the audience’s perspective. 3. Do not provide extra information besides subtitle.
[Result example]
<subtitle>
00:00:01.970 –> 00:00:06.119오디오 전사 작업은 복잡하고 시간이 많이 걸리며 비용이 많이 듭니다.
00:00:06.329 –> 00:00:10.800전사를 수작업으로 맡기거나 유지보수가 어려운 애플리케이션을 개발하거나
00:00:10.810 –> 00:00:15.710통합하기 어렵고 결과물 품질이 낮은 서비스를 이용해야 합니다. </subtitle>
Step 5: A methodology to generate a concise video clip using Amazon Bedrock
When using Amazon Transcribe, subtitle timelines ranging from a few dozen lines to several hundred lines can be generated. By leveraging Claude, Anthropic’s latest large language model (LLM), on Amazon Bedrock, you can extract subtitle files that specifically highlight key moments throughout the entire video.
Following are the steps to create a video summary or short clips:
- Use prompt engineering to extract key moments from the webVTT files.
a. Documentary example: “Extract the important parts from the video.”
b. e-commerce example: Persona: You are a professional short-clip editor. Task: “Select the highlight where the speaker accentuates the advantages of the product.”
- Use ‘re’ module to extract time ranges of a summarized webVTT file.
- Use the ‘MoviePy’ library to concatenate the video clips and build an MP4 file.
- Upload the MP4 file to S3 if it needs to be further transcoded using AWS Elemental MediaConvert.
Conclusion
This blog post demonstrated how to use AI services from AWS to automate and streamline media content localization. Builders can use Amazon Transcribe and Amazon Translate and add preprocessing procedures to improve translation quality. By leveraging Amazon Bedrock with prompt engineering, direct translation of subtitles is possible, but human review in a loop is required. Furthermore, video editors can generate a short clip using a combination of the subtitling context and LLM.