AWS Machine Learning Blog
Subtitling videos accurately and easily with CaptionHub and AWS
This is a guest post from Graham Pengelly, CTO, and James Jameson, the Commercial Lead, at CaptionHub. CaptionHub is a London-based company that focuses on video captioning and subtitling production for enterprise organizations.
While the act of captioning—that is, taking video files and making sure the text on the screen reflects what’s being said accurately and is timed appropriately—seems simple at the outset, there is more complexity than meets the eye.
When we embarked on building CaptionHub in 2015, we were a design agency producing video effects and commercials for clients, including a massive tech company in California. They wanted us to localize their video—to their high standards, of course—and do it on the tight schedule of a global consumer tech release.
To meet our client’s needs, we found ourselves building a new software tool to manage linguists, provide collaborative subtitling, and make subtitles frame-accurate. To speed up the process, we then added AI called Natural Captions Technology, an algorithmic approach to natural language processing that reflects the natural language of humans.
From this starting point, we recognized the ubiquitous needs for a solution like what we had created. We broadened the types of media we handled from simply marketing or internal communications assets to high-value global output ready for any viewer or listener worldwide.
With CaptionHub today, we take recorded video and create perfect subtitles, fast. We generate subtitles using automatic speech recognition to massively speed up the first cut. Then, we make sure that subtitles are timed perfectly (“frame-accurate,” in our lingo), on the belief that subtitling should be a seamless part of the production workflow. We also provide automated and human-enabled translation to localize video for any audience. Now, with the help of AWS, we can do that for live video streams and on-demand video.
With AWS, we can provide an enterprise localization platform for the most demanding of our clients, regardless of their use case. AWS technology spans our servers and low-level infrastructure decisions up to the engines we choose for speech recognition, machine translation, and the sharp-end value points that delight our customers.
On the artificial intelligence and machine learning side, we use Amazon Translate and Amazon Transcribe for smooth, real-time captioning across dozens of languages. AWS has been a crucial inspiration for our newest offerings.
We use a variety of other AWS services that are critical to our infrastructure and application architecture. AWS Elemental MediaPackage handles output streams from CaptionHub live, combining captions and video/audio, while AWS Elemental MediaLive handles the input streams for CaptionHub live. While all of this is orchestrated in perfect harmony, we use Amazon CloudWatch to monitor our AWS infrastructure.
With this AWS-based setup, we’re unstoppable. We’re able to scale up and down however and whenever we need to. AWS has allowed us to vastly accelerate our mission to help organizations localize their media.
Our customers have reported huge savings in workflow time, up to an 800% increase in production for captions and subtitles using automatic speech recognition, which takes advantage of the same tech behind Alexa. That amounts to a significant financial return, even for the world’s largest and best-funded production and marketing departments.
We live in a world that communicates with video. When our clients’ production values, combined with their potential to reach audiences, quite literally define their brand, it’s no wonder they want to maintain that winning edge. With CaptionHub’s captioning solutions, made possible by AWS, we can ensure that organizations reach audiences in their language, quickly and perfectly, on any device, wherever they are.