Why AI Dubbing Is More Opportunity Than Threat

Illustration of a robot hand filling in a speech bubble
Illustration: Cheyne Gateley/Variety VIP+

Amid Hollywood’s double strikes, and within the larger entertainment industry, the usefulness of generative AI tools in the creative process has been fiercely debated. But a less contentious and risk-prone area could emerge in AI dubbing, expected to become a more significant contributor in content localization efforts.

Demand for entertainment content localization has grown in recent years, as film and TV distribution globalizes with the rise of streaming. As companies have ramped up content spending to fill streaming service coffers, simply more content now needs to be localized, whether with dubbing or subtitles.

Netflix has notably ramped up its localization efforts. Likewise, Disney’s content localization spend has risen in recent years, growing to $33 billion in 2022 versus $25 billion in 2021, according to the company’s VP of distribution operations, Andrew Aherne. And it’s no secret that local content can perform extremely well in outside markets, regardless of their native language, with proven hits like Korean drama “Squid Game.”

But even as demand and localization spend has grown, it hasn’t been economically viable to dub content at scale. In many lower-resource markets, dubbing simply does not happen. Hollywood studios and streaming services have limited resources to commit to producing dubbed versions of content in smaller markets. Likewise, there may simply not be enough or any operational capacity in regions for low-resource languages (e.g., Vietnamese).

“That’s why you get such poor dubs in a lot of the low-resource languages, because the higher-resource languages and bigger markets get a lot more attention and go through a lot more quality control and rounds of iteration. There’s a lot more budget to spend on them,” said Zohaib Ahmed, founder at Resemble AI.

This all goes to suggest that the market opportunity — and potential unfilled demand for entertainment-grade content localization — is actually much bigger than what is now or has previously been possible.

AI dubbing could help to fill the gap and, in turn, unlock the ability to dub net-new content and distribute it into net-new markets that were previously inaccessible due to cost, time and even human constraints. This could “supercharge” a piece of IP as it becomes possible to expand the set of available, strategically important markets where it can send dubs. At the very least, consumers don’t appear bothered by the idea of watching content dubbed with AI.

Although preferences differ, many consumers in major European and LATAM markets prefer dubbed content over subtitles, per data from credit decision intelligence company Morning Consult.

Many AI startups, including Deepdub, Papercup, Resemble AI and ElevenLabs, have all been working to offer dubbing, including as managed services or self-service. Many already work with customers in entertainment, including domestic and international studios, streaming services, gaming and creators.

The opportunity conceived by these companies is less focused on replacing existing voice actor talent or divesting human expertise from dubbing workflows altogether. For many, the bigger-fish opportunities are viewed as dubbing films or shows for new markets and languages that were previously off limits, library content and back catalogs that were never before dubbed and even the customization of programming for FAST channels.

“Localization is the biggest bucket that we’re working on for film and TV,” said Ofir Krakowski, CEO and co-founder of Deepdub. “It’s mostly library content that was not currently dubbed because it was not economically viable to dub."

Further, while more content is presently dubbed or subbed from English, AI dubbing could open more capacity to dub non-English-language content into English or even non-English-language content into another non-English language. For example, Krakowski indicated Deepdub had dubbed “hundreds of hours of content and dozens of TV series from international content into English.”

Localization quality has sometimes suffered in these cases due to resource challenges and limited capacity. English dubbing — that is, dubbing from other languages into English — is a newer consideration for dubbing providers, which are challenged to find translators and voice actors from one low-resource language to another due to the rarity of certain “language pairs” (e.g., Norwegian/Thai or Mandarin/Spanish). More often, the “bridge” language is English. For example, in order to go from Korean to Thai, a provider may have to find translators to go from Korean to English and then English to Thai.

Amid surging demand, the localization industry — spanning a network of regional language service providers — is facing a lack of qualified translators. That’s simply because there’s more content being produced than ever, all of which needs to be localized from more source languages into more destination languages than ever.

Meanwhile, casting with AI voices can become more manageable, particularly for low-resource languages where it can be difficult to find actors similar in voice profile to the original.

AI dubbing competes with traditional dubbing on price and efficiency. AI film dubs are much faster and cheaper to produce than traditional dubbing methods by orders of magnitude, with cost reductions variously estimated at 30% to 40% for a high-end production, or 10x less on average, while speed was estimated at a 5x improvement over traditional workflows.

Naturally, the prospect of AI dubbing raises salient questions about the role of voice actor talent, translators and other participants in existing localization workflows. But talent displacement due to AI dubbing might be minimal or minimized as AI companies position to complete dubs that wouldn’t otherwise occur.

Most AI dubbing platforms also need humans to be in the loop to conduct quality control for high-quality or theatrical-grade dubs. Automation quality isn’t perfect, and may never be, meaning additional human review and adjustment is still needed to improve the output. Platform interfaces enable manual corrections to fine-tune and personalize inaccurate translations or vocal performances that capture missing cultural or verbal nuances of certain regions. For example, a literal translation of the convenience store “7-Eleven” wouldn’t be understood in markets where the stores don’t exist. This process often involves regional language experts, directors, audio engineers, casting directors, writers and sometimes voice actors.

But the systems have some limitations. Notably, they're not yet controllable enough to apply needed levels of vocal nuance that entertainment-grade dubs often need. Additional control levers will emerge over the coming year(s), potentially for users to manually modify intonation, inflection, stresses, degrees of emotion (e.g., happier), pace (e.g., faster), age (e.g., older) or even gender.

Further, in some markets, oftentimes specific voice talent is recognized as the consistent voice of an actor — for example, the same person voicing Tom Hanks or Tom Cruise. In such cases, using a synthetic voice for a dubbed version in lieu of established local actors wouldn’t make sense for consumer expectations.

It’s also possible that voice actors themselves would begin to work with these companies, potentially to clone their voice for licensed uses in content in a similar way to many actors that are creating personal avatars.

Ultimately, AI dubbing could become a more accepted use case for generative AI to fill necessary gaps, where the opportunity is the greatest and where financial and human resources are already constrained. If this level of demand becomes the norm, AI dubbing could be a viable industry solution.

See our full catalog of artificial intelligence articles ...

Read the Report