Zuckerberg exposed Meta's small goal: AI automatically generated metaversms, translating all languages in real time

2022-02-24 13:00:50

Reports from the Heart of the Machine

Editing: Zenan, mayo

Will they all become killer apps in the metaverse era?

Meta is working on artificial intelligence research to generate metaverse worlds through speech, and there are many more amazing technologies. CEO Mark Zuckerberg said Wednesday that the company is looking at ways to improve how smoothly people communicate with voice assistants and how they translate between different languages.

Recently, Zuckerberg is leading Facebook all in the metaverse and predicting that in the future people can work, socialize and play in virtual worlds, an environment that will eventually replace the Internet.

As for how the metaverse, virtual reality, can immerse people in it, "the key to unlocking these advances is artificial intelligence," Zuckerberg said.

Zuckerberg exposed Meta's small goal: AI automatically generated metaversms, translating all languages in real time

Make cooking look as easy as it was in The Sims.

Meta is working on a new class of generative AI models that will allow people to describe a world where AI automatically generates aspects of it. In yesterday's demo, Zuckerberg showed off an AI concept called Builder Bot, where people appear in 3D incarnations on islands in the metaverse and issue voice commands to create beaches, where the system can follow people's commands to change the weather at any time, adding trees and even picnic blankets.

Beaches, islands, coconut trees, what scene you want, talk to the AI and you'll have it.

"As we push this technology further, you'll be able to create the world of your dreams, explore and share experiences with others with your voice." Zuckerberg didn't set a timeline for the plans, though, nor did he provide more details about how Builder Bot works.

The other part is speech recognition technology, and Meta says it's working on artificial intelligence to enable people to have more natural conversations with voice assistants, a step toward allowing people to communicate smoothly with AI in the metaverse. Zuckerberg said the company's CAIRaoke project is "a complete end-to-end neural model for building device assistants."

In Zuckerberg's presentation, CAIRaoke played a role in a "very practical" family scene: When a person was cooking a stew, the voice assistant would issue a prompt warning that the salt had been added to the pot. The AI assistant also noticed that there was less salt, so he prompted him to put more.

In Meta's subsequent blog, the researchers interpreted the technology behind CAIRaoke. The traditional approach to AI voice assistants requires four sets of inputs and outputs — one for each layer of the pipeline (NLU, DST, DP, and NLG). It also requires defining standards for the inputs and outputs of each layer. For example, for NLU, a traditional sessional AI system requires defined ontologies (for example, various intents and entities).

But the new model proposed by Meta does not prescribe a conversation flow at all, and we only need a set of training data to use. CAIRaoke reduces the effort required to add new domains. In the canonical approach, expanding into new areas requires sequentially building and changing each module before the next module can be reliably trained. In other words, if NLU and DST change every day, you can't effectively train a DP. Changes to one component can break the effects of other components, requiring retraining of all subsequent modules. This interdependence slows down the progress of subsequent modules.

Meta's end-to-end technology eliminates the reliance on upstream modules, improves development and training speeds, and allows us to fine-tune other models with less data.

Meta says it already uses the model in its video calling device Portal and looks forward to integrating it into hardware devices with augmented reality (AR) and virtual reality (VR). In an interview with Reuters, Jér me Pesenti, Meta's vice president of AI, said the company is severely limiting the response of its new CAIRaoke-based AI assistant until it can ensure that the system does not produce offensive language.

"These language models are powerful... We're trying to figure out how to control it," Says Pesenti.

Zuckerberg also announced that Meta is developing a universal speech translator designed to provide instant speech-to-speech translation across all languages. The company has previously set a goal for its AI system to translate all written languages.

"Being able to communicate with anyone in any language is a superpower that people dream of, and AI will achieve this in our lifetime." Zuckerberg said.

Although current translation tools work well into common languages such as English, Mandarin, and Spanish, about 20% of the world's population does not speak the languages covered by these systems. Often, these underserved languages do not have an easily accessible written text corpus that is also necessary to train AI systems, and even some languages do not have standardized writing systems at all.

Meta says it hopes to overcome these challenges by deploying new machine learning technologies in two specific areas. The first, called No Language Left Behind, will focus on building AI models that can learn translated languages using fewer training examples. The second is a universal speech translator, which aims to build a system that directly translates speech from one language into another in real time, without the need for a written component as an intermediary (written mediation is a common technique for many translation applications).

Specifically, Meta is building a new advanced AI model that can learn language from fewer examples, and Meta will use it to enable expert translations in hundreds of languages, from Asturian to Luganda to Urdu. Meta is also building a new universal real-time speech translator to support languages and spoken languages without a standard writing system.

Based on the automated dataset creation tool LASER, Meta researchers have built systems such as ccMatrix and ccAligned that are capable of finding parallel text in different languages on the Internet. With little data available for low-resource languages, Meta has created a new training method that enables LASER to focus on specific language subgroups — such as Bantu languages — and learn from smaller datasets.

These efforts have enabled LASER to run efficiently across languages at scale, and Meta has recently expanded LASER into speech processing.

To improve the performance of machine translation models, Meta invested significant resources in creating high-volume and efficiently trainable models (expert hybrid models for sparse gating). By increasing the volume of the model and the automatic path learning function, different symbols can use different expert abilities. To extend the text-based machine translation model to hundreds of languages, Meta built the first multilingual translation system that did not have English at its core, and its effect was even better than the best bilingual translation models.

In a blog post announcing the news, Meta Research does not yet provide a timeline for completing these projects, nor does it provide a major roadmap to achieve the goals. Instead, the company simply highlights the possibility of translation in the lingua franca.

Meta also envisions that this technology will greatly benefit its global products, further expanding its reach and transforming it into an indispensable communication tool for millions of people. As the blog post notes, universal translation software will be the killer app of future wearables, such as the AR glasses that Meta is building, and will also break the boundaries of "immersive" VR and AR reality spaces (which Meta is also building).

In other words, while developing a universal translation tool brings humanitarian benefits, it also makes good business sense for companies like Meta.

The social media-focused company, which has seen its market value shrink by nearly a third after its recent earnings report, has shifted its efforts to building a virtual world and changed its name directly to that end. This month Meta reported a net loss of $10.2 billion for the company's augmented and virtual reality business, Reality Labs, for 2021.

Pesenti, head of AI at Meta, said the company is exploring how AI regulates content and behavior in the metaverse.

"On our major platforms, a lot of AI is used to tune in with the content. The metaverse is a little different because it's more real-time," Pesenti said. He says it's a "developing" effort, and Meta is also working on some of the strategic issues of the metaverse.

At the AI event, Zuckerberg said Meta is exploring how AI can interpret and predict the types of interactions that may occur in the metaverse through self-supervised learning. As a result, AI can obtain raw data through self-supervised learning, rather than being trained with a large amount of labeled data.

Meta is also studying individual-centered data, including seeing the world from the first person. Zuckerberg said Meta has formed a global alliance with 13 universities and laboratories to advance the study of the Ego4D dataset, the largest individual-centered dataset available today.

For reference:

https://www.reuters.com/technology/metaverse-event-metas-zuckerberg-unveils-work-improve-how-humans-chat-ai-2022-02-23/

https://ai.facebook.com/blog/teaching-ai-to-translate-100s-of-spoken-and-written-languages-in-real-time/

https://www.theverge.com/2022/2/23/22947368/meta-facebook-ai-universal-speech-translation-project

Zuckerberg exposed Meta's small goal: AI automatically generated metaversms, translating all languages in real time

Read on