SeamlessM4T, here is the universal translation according to Meta AI

From Mark Zuckerberg's company the first multilingual and multimodal conversion and transcription model that handles as many as 100 languages

Meta: AI and Universal Translation: Meta features SeamlessM4T
Meta office in Austin, Texas: Mark Zuckerberg's company's AI division works on universal translation (Photo: Meta)

There are nearly 3.500 living languages ​​in the world that are mostly spoken, and for which there is no standard or widespread writing system among the speaking population. Yet, to date, the remarkable efforts of Big Tech for the real-time translation, who have always and in any case referred to the Babel Fish born from the blessed pen of Douglas Adams, have focused on written languages.

Perhaps because developing a technology capable of translating spoken language without having tons (or terabytes) of pages available for training a model of artificial intelligence it is a challenge that requires taking a step into the void, going beyond standard techniques to some extent.

He thought about it AI goal, which presented ahead of Google the first model of multimodal translation and transcription (and multilingual) all-in-one. Is called SeamlessM4T, and is the last piece of the Universal Speech Translator, an ambitious long-term project born to “breaking down language barriers in the physical world and in the metaverse".

The document “SeamlessM4T – Massively Multilingual & Multimodal Machine Translation” by Meta (in English)
A multilingual magazine to innovate and truly reach everyone

Artificial Intelligence: Meta takes the leap forward
Meta's universal speech-to-speech translation breaks new ground in the Metaverse as well (Photo: Envato)

The universal translation: in the beginning it was the Babel Fish

"The Babelfish is small, yellow, resembles a leech and is perhaps the strangest thing in the Universe”: when stuck in someone's ear, it allows you to instantly understand anything, in any language. In the Hitchhiker's Guide to the Galaxy, the babel fish allows Arthur Dent to understand the language of the Vogons, repulsive aliens intent on destroying the Earth for the construction of an intergalactic highway.

The reference to the novel by Douglas Adams is almost a must when it comes to universal translation: Altavista's first machine translation service was called Babel Fish, and Google also used the image of a small yellow fish for its translation services.

When then, in 2016, Google Translate updated one of its algorithms to try to understand the meaning of sentences before translating them, it started to be openly said that the Babel Fish was close to becoming a reality.

After seven years, we're almost there. Meta AI has just unveiled what appears to be, to date, the most accurate interpretation of the little yellow fish: it's called SeamlessM4T, and it is the first AI model of multilingual and multimodal translation and transcription. That is: a software able to translate (and transcribe) around 100 languages also starting from spoken language.

The Newspeak of Facebook (or of… “Artificial Dementia”)

AI, Meta presents the first universal translation model
There are over 3.500 languages ​​in the world that are mostly spoken: Meta AI's project for speech-to-speech translation works to not leave them behind (Photo: Envato)

Universal Speech Translation: this is Meta's project

The new translation system is part of theUniversal Speech Translator Project, which in 2022 saw the birth of the first speech-to-speech translator for the Hokkien language, a language spoken in eastern China that does not have a widespread standard writing system.

Then, Mark Zuckerberg he presented the project by appearing in a multilingual conversation with a Hokkien-speaking collaborator, in which he announced the birth of the first AI system capable of simultaneously translating a language starting from speech.

In most software automatic speech-to-speech translation, from voice to voice, the spoken language is converted into text, translated into the output language and finally transformed back into sound. This procedure, explains Meta, “makes speech-to-speech translation dependent on textual form, in ways that make it very difficult to apply the technology to languages ​​that are primarily spoken".

Meta's universal translation project aims at direct speech-to-speech translation without going through inefficient transcriptions, which among other things risk leaving thousands of spoken languages ​​behind.

The Universal Speech Translation project, launched early last year, is presented by Meta as a long-term commitment, and has already given birth to several projects aiming at the same goal: develop a new artificial intelligence model able to learn languages ​​starting from a few examples, therefore to support languages ​​that do not have a standard writing system or for which there are few texts.

Meta calls it "speech-based approach”, and promises that a technology of this kind "it can pave the way for much faster and more efficient translation systems”, since it skips all the steps related to converting speech to text and vice versa.

Codex: the software that transforms language into code

AI and machine translation: where are we?
Meta's London offices: the company gives employees the ability to customize their workspace (Photo: Meta)

SeamlessM4T and the state of the art: Meta's leap forward

Also SeamlessM4T it is part of the ambitious long-term commitment that sees Mark Zuckerberg's company engaged in full translation "for a more connected and inclusive world". The new technology developed by Meta AI is multimodal, i.e. it can translate starting from a voice or text output and render the translation one way or the other, all within a single model.

Currently, he explains Paco Guzman, Research Scientist Manager at Meta AI, “supports nearly 100 languages ​​for text and 35 for voice translation” (plus English), and is also able to understand when you switch languages ​​during a conversation.

SeamlessM4T is already able to translate 36 languages ​​in speech-to-speech mode, that is, only with the voice, and will be released with CC BY-NC 4.0 license, a license that is open to universities and researchers but does not permit commercial use of the tech, unlike what happens with the model Whisper by OpenAI.

Comparison with OpenAI helps to measure the progress of Meta compared to the state of the art: SeamlessM4T-LARGE, the model of automatic recognition of the more extensive language, use 2,3 billion parameters, the large version of Whisper stops at 1,55 billion. Same thing for the lighter models: 281 million parameters against the 39 million of the OpenAI model.

Glasses for machine translation: it's Google's challenge

Meta introduces SeamlessM4T, the Babelfish is close
Mark Zuckerberg on stage at Facebook F8, Meta Platforms' annual event for developers and entrepreneurs (Photo: Meta)

From Mark Zuckerberg a single but multitasking system

While the sales offices prepare for the launch of Meta Quest 3, the new virtual reality headset, Meta AI researchers continue successfully on the path of universal translation AI generated.

The model illustrated in the paper is based onUnitY multitask architecture, which is able to directly generate text and voice translations thanks to three main sequential components, which are pre-trained to ensure model quality and training stability.

- text encoders and vowels have the task of recognizing language; The text decoder then it transfers the translated meaning into a text that arrives at the text-to-unit model, which translates it to “discrete acoustic units”. Finally, these are transformed into sound by a vocoder.

"Compared to approaches that use separate templates, SeamlessM4T's single system approach reduces errors and delays, improving the efficiency and quality of the translation process”, we read in Meta's blog.

"Building a universal translator, like the Hitchhiker's Guide to the Galaxy's Babel Fish, is a great challenge because existing speech-to-speech and speech-to-text systems cover only a small portion of the world's languages”, explains the blue Big Tech in the same post, “We believe this is a significant step forward in this journey".

Artificial intelligence to decipher the mysterious Cypro-Minoan

Meta: Meta AI features the first multimodal translator
Meta AI's SeamlessM4T is the first multilingual and multimodal translator capable of recognizing 100 different idioms (Photo: Envato)