The first ‘all-in-one’ model that can comprehend and translate both speech and text is SeamlessM4T.
An artificial intelligence-powered translating engine that can translate languages for both text and speech has been released by Facebook’s parent company Meta.
The translation tool, known as SeamlessM4T, is the “first all-in-one multilingual multimodal AI transcription and translation model,” according to Meta.
Multimodal engines are software tools that can produce translations into either text or speech and comprehend language from both speech and text.
Depending on the task, SeamlessM4T can translate up to 100 languages from text to speech, text to text, speech to text, and text to voice.
“SeamlessM4T’s single system strategy reduces mistakes and holdups while increasing the efficiency and standard of the translation process,” Meta stated. “Compared to approaches using separate models.”
It continued, “This makes it easier for people who talk different languages to communicate with one another.”
The AI-driven translation market is flourishing.
According to India-based Acumen Research and Consulting, the size of the global machine translation market is predicted to increase from $812.6 million in 2021 to over $4.1 billion in 2030.
The practice of translating text or speech into another language using software is known as machine translation.
According to Meta, SeamlessM4T is being made available to the public under a research license so that researchers and developers can expand on this work. The information for Seamless Align, the largest open multimodal translation data set to date with 270,000 hours of extracted speech and text alignments, has also been made available.
Nearly 100 languages can have their voice recognized by the new translation engine. It is capable of translating speech to text in approximately 100 input and output languages. Almost 100 languages can be used as inputs and 36 languages, including English, can be used as outputs for speech-to-speech translation.
Additionally, it supports 35 (including English) output languages and approximately 100 input languages for text-to-text and text-to-speech translation, respectively.
SeamlessM4T, according to Meta, is a component of its efforts to develop a universal translation.
No Language Left Behind (NLLB), a text-to-text machine translation model that covers 200 languages, was introduced by Meta last year. It is now one of the translation services available on Wikipedia.
It released its first spoken language speech-to-speech translation program in October. The system was created as part of Meta’s Universal Speech Translator project, which aims to create AI systems that can translate speech to speech in any language.
Massively Multilingual Speech, which offers speech recognition, language identification, and speech synthesis technology spanning more than 1,100 languages, was unveiled by the business earlier this year.
“SeamlessM4T” is a project that “draws on results from all of these projects to provide a multilingual and multimodal translation experiences stemming from just one model, built over an extensive variety of spoken data sources with state-of-the-art results,” Meta said .Additionally, SeamlessM4T has a code-switching feature. When a multilingual speaker speaks in multiple languages, it occurs. It enables the engine to recognize and translate many languages when they are jumbled together in a single statement.