Skip to main content

Google’s one step closer to building its 1,000-language AI model

Google’s one step closer to building its 1,000-language AI model

/

Google has all kinds of AI tech in development, including this Universal Speech Model, that’s part of its attempt to build a model that can understand the world’s 1,000 most-spoken languages.

Share this story

Google logo and black swirls
Illustration: The Verge

As Microsoft and Google butt heads over whose AI chatbot is better, that isn’t the only use for machine learning and language models. Along with rumored plans to show off more than 20 products powered by artificial intelligence during its annual I/O event this year, Google’s progressing toward its goal of building an AI language model that supports 1,000 different languages. In an update posted on Monday, Google shared more information about the Universal Speech Model (USM), a system Google describes as a “critical first step” in realizing its goals.

Last November, the company announced its plans to create a language model supporting 1,000 of the world’s most-spoken languages while also revealing its USM model. Google describes USM as “a family of state-of-the-art speech models” with 2 billion parameters trained on 12 million hours of speech and 28 billion sentences across over 300 languages.

USM, which YouTube already uses to generate closed captions, also supports automatic speech recognition (ASR). This automatically detects and translates languages, including English, Mandarin, Amharic, Cebuano, Assamese, and more.

Right now, Google says USM supports over 100 languages and will serve as the “foundation” to build an even more expansive system. Meta’s working on a similar AI translation tool that’s still in the early stages. You can read more about USM and how it works in the research paper Google posted here.

One destination for the technology could be inside augmented-reality glasses like the concept Google showed off during its I/O event last year, able to detect and provide real-time translations that appear right before your eyes. This technology still seems a bit far-off, though, and Google’s misrepresentation of the Arabic language during I/O proves how easy it can be to get something wrong.