How the PGA TOUR uses machine learning and generative AI from AWS to enhance Media Asset Management

This blog was co-written by Byron Chapman, Director Media Asset Management & Media Workflows, PGA TOUR and Andres Carjuzaa, Co-Founder & CTO, Around.

The PGA TOUR is the world’s premier membership organization for touring professional golfers. It co-sanctions tournaments on the PGA TOUR along with several other developmental, senior, and international tournament series.

Housing, managing, and enhancing a scalable and content-rich Media Asset Management (MAM) system is important to the TOUR, as it is across most sports organizations. The PGA TOUR MAM not only preserves the history of the game (with content dating back to 1920), but also provides over 400 content creators and partners with the footage they need to create new stories and bring context to golf’s most exciting moments.

The continued journey of the PGA TOUR MAM

In 2022, the TOUR completed the migration of its MAM system to Amazon Web Services (AWS), along with 9+ petabytes of content to Amazon Simple Storage Service (Amazon S3). With replication across regions and continued ingest of new content (nearly 22,000 hours of golf content produced each year), the MAM archive is now upwards of 25 petabytes in size and continues to grow ~2 petabytes per year. The PGA TOUR MAM migration to the cloud significantly reduced the time to locate and retrieve content while also greatly expanding who could access the MAM directly. Now, broadcast production teams and content creators can easily find historic clips on demand, and they can do so from anywhere in the world.

A major business driver in migrating the MAM to AWS was for the TOUR to take advantage of additional cloud-based tools to further enhance its 215,000+ hours of golf content, which included enhancing the metadata and searchability of their assets. At the end of 2023, the TOUR’s MAM had 9 million log entries, which consisted of data describing golf action for a particular asset. This data is associated through a combination of automated and manual logging, including associating scoring data produced by ShotLink, the TOUR’s proprietary scoring solution that captures and distributes data for every golf shot during tournament competition. The ability to search based on commentary during telecast and interview conversations was limited to what a content logging team of seven was able to manually transcribe. Some player interviews and things like “walk and talks” were logged, but what a logger could produce in an 8-hour window was limited. In an effort to streamline speech-to-text transcription, the TOUR turned to Amazon Transcribe, a machine learning service that automatically converts speech to text.

The TOUR’s Media team engaged with their preferred media solutions development team, Around, to enhance the MAM workflow by using Amazon Transcribe to convert commentary and conversation to text, enriching the metadata attached to the media assets. The initial focus was to set up workflows to run all new content ingested to the MAM through Transcribe, followed by historical footage. By the end of the first five months of processing historical transcriptions, the TOUR’s metadata count had grown to 16 million entries (6.5M new Transcribe created entries).

Figure 1: View of the TOUR’s MaM using the speech-to-text advanced search function to find entries containing Transcribe logs of “What a shot”.

How does it work?

One of the challenges the TOUR had to solve for was the length of its video content. Golf coverage during a tournament is long—up to 12 hours each day. Transcribe currently has a limit of processing 4 hours of content at a time. The TOUR and Around solved for this by building a container-based workflow engine to chunk out the mezzanine video into smaller MP3s. AWS Elemental MediaConvert is then used for video processing, and AWS Lambda is used to kick off the speech-to-text transcription via Transcribe. Once the smaller transcription jobs process, the content is stitched back together as a JSON file using timecodes, and associated to the MAM asset. This process searches for new content ingested every 4 hours, allowing for new content to be enriched daily.

Figure 2: PGA TOUR MAM + transcription processing architecture.

The MAM live ingest and processing architecture is simplified in the previous diagram to emphasize the transcription workflow to enhance MAM asset metadata. The workflow engine and transcriber components are Docker containers running on Amazon EC2.

The Item Collector API gets items available in the MAM for defined periods of either recently created or historical and creates a record in the state database;
The Orchestrator API searches for items based on period and marks ready to process or marks item delete if not present in the MAM;
The Transcriber API evaluates items ready to process and calculates required file splitting based on Amazon Transcribe batch processing capability, then invokes MediaConvert to split the files and post result to SQS;
Once file splitting is complete (SQS event), Lambda triggers Amazon Transcribe to produce JSON payload aligned to referenced time codes;
The transcribed payload is then passed to Amazon Bedrock with a prompt to use generative AI to look for and correct misspelled words (e.g., player and entity names) without adding any additional text;
The completed transcription payload is posted back to the workflow engine, which associates the chunk to the appropriate asset as additional metadata;
The process includes steps for retry or error handling if needed with the state of each step tracked in the state database.

To further enhance the quality of the metadata generated, the TOUR decided to also incorporate Amazon Bedrock into its workflows. This allows for better data accuracy for golf terms like “birdies”, and “bogies”, equipment names, as well as more accurate and consistent spelling of entity names such as player names and sponsors. Integrating Bedrock not only increases data accuracy, it is a simpler solution to implement and maintain compared to creating a custom vocabulary or custom language model in Transcribe.

Figure 3: Zoomed View of Transcription Workflow w/ Bedrock.

Anthropic’s Claude 3 Sonnet foundation model was chosen for its performance, price, and results as compared to other models. The following prompts are used to guide the model:

You are a proofreader who works for a golf media company;
You are given a raw golf commentary transcript;
You have to fix the transcript. Here are the things you have to fix: the language, the grammar, the names of players, company names, brand names, golf-specific terms;
Note: Do not add text or remove text. Rather replace misspelled text with corrected versions of the text. Also add punctuation so the sentences make sense;
Note: Provide the revised text directly as json with the same structure as the request without any introductory statements and respect the same position in the original array;
Here is the raw transcript. An array with {itemsCount} elements of JSON object whose only fields are ‘Content’ and ‘Id’;

The transcript from Transcribe is split into chunks to stay within Claude 3 Sonnet’s output token limit of 4096 tokens. Each chunk is sent in with the prompt into Bedrock. Bedrock responses are validated using word counts. If a response chunk has a highly deviated word count, the original Transcribe chunk is used, and the response chunk is discarded.

Applying the automation with speech to text increased asset metadata by millions of entries for broadcast assets, interviews, and other speech-to-text use cases across the entire MAM, instead of only a small portion performed manually before. This also leads to significant time savings, where an 8-hour day of manual transcription is now transcoded automatically in 1-2 hours. Now that all new and historical content is transcribed, post-production and content teams have the opportunity to find the most exciting and relevant moments in golf and bring them forward for fans.

What does the future hold?

While the TOUR’s MAM has gone through a major transformation over the past several years, there are still plenty of additional opportunities to expand and improve it. Golf is a global sport and translation is also in consideration to allow international teams and partners the ability to search the TOUR’s MAM in their native language. This will expand the ability for content creators to gather additional footage and exciting moments for players that are popular within a specific region or country.

Another area of interest is object and logo detection, which would enable a greater ability to leverage the MAM for monetization opportunities. For example, the ability to provide a sponsor or partner with data that quantifies how many times their brand or logo is shown during a specific tournament may be valuable. Similarly, data transcription can help with sales pitches and ROI discussions by determining how many times a brand is mentioned or discussed during a broadcast window.

As the TOUR continues to expand its MAM capabilities, it enables opportunities to enrich both present and future event broadcasts, enhance storytelling for domestic and international platforms, and increase revenue opportunities through monetization. With more and more enriched metadata becoming available every day, the PGA TOUR, with support from AWS, is enriching content for golf fans around the world.

AWS for M&E Blog

How the PGA TOUR uses machine learning and generative AI from AWS to enhance Media Asset Management

The continued journey of the PGA TOUR MAM

How does it work?

What does the future hold?

Resources

Follow

Learn

Resources

Developers

Help