The Evolution of AGI General Artificial Intelligence: Multimodal Perception and Multi-task Collaboration

author：Wu Yan was silent, 0123 2023-10-31 12:41:00

https://mp.weixin.qq.com/s/61fnKSh4A5jHftDZ8f5f5w

Recently, we have talked a lot about the content of AIGC, the so-called AIGC is Artificial Intelligence Generated Content, that is, AI-generated content, which is relative to PGC (professionally generated content) and UGC (user-generated content). So, AIGC is just a branch line of the entire AI development, so what is the main line of AI development at present? It is the main goal, which is AGI, and the G in this is no longer the meaning of generation, but the meaning of general general, AGI is Artificial General Intelligence, general artificial intelligence.

The word general is still a bit abstract, we can simply understand that it has a human-like all-round ability, rather than only a specific task, for example, AlphaGo can only play Go, ChatGPT was said by Zhihuijun, who developed humanoid robots: "What we want AI to do is to cook, clean the room, wash clothes, throw out garbage, shovel, some time-consuming and laborious procedures, work to make money, etc., and now what AI is actually doing is chatting, drawing, writing, composing, games, etc." So what are the research directions to achieve AGI?

In the book "Generative Artificial Intelligence", five directions are mentioned, which I will understand based on my own understanding:

First, cross-modal perception. We refer to each domain of information sources that we are exposed to on a daily basis as a modality, and these sources can be words, sounds, images, tastes, touches, and so on.

This is also the current concept of multimodality, such as Meta's open-source ImageBind, which combines image, video, text, audio, depth, thermal data, and IMU data (inertial measurement, used to monitor motion data) to create multi-sensory content. Through this, we can generate pictures from audio, for example, through the call of seagulls can generate pictures of seagulls; You can also convert images to audio, text, video, etc. That is, these types of data can be converted to each other.

At present, the common ChatGPT 4.0, Wenxin Yiyan, iFLYTEK Xinghuo, etc. are all multimodal large models.

Second, multitasking. Humans are able to handle multiple tasks at the same time and coordinate and transition between different tasks. When people are faced with robots, a simple command, such as "please help me warm up my lunch", "please help me bring the remote control", etc., these commands sound simple, but the execution includes a series of actions such as understanding the instructions, breaking down tasks, planning routes, and identifying objects.

At present, Microsoft's Copilot assistant has been connected to the latest Windows 11 through natural language dialogue, understand the user's intention and carry out related software operations, such as natural dialogue to generate ppt, and automatically adjust the style or content in the ppt; Baidu Library can also use large models for knowledge summary, document generation, and intelligent editing of documents. As Robin Li said, "All applications are worth redoing with a large model."

For multi-tasking collaboration, software products will be able to operate through natural language in the future, and there is no need to use various menu options that do not know where, which is the current popular concept of Agent. Hardware robots will also be connected to large models and understand human intentions, such as the robot dog developed by Boston Dynamics can already talk to humans, and the development goal of each humanoid robot is also to talk to humans freely, understand intentions and perform corresponding operations. I went to see the Shenzhen Artificial Intelligence Exhibition this year, and the robots I saw included robots that can make coffee, machines that can automatically stir-fry, etc., these can only be said to be robots in specific fields, at present, humanoid robots developed by Tesla, Xiaomi, and Zhihuijun will have greater versatility, and the skills learned can also be continuously iterated.

In the era of AI, the speed of technological development has accelerated, and the future has come, what we need to do is to expand our knowledge, embrace change, and learn the ability to swim in this new wave.

For more content, welcome to pay attention to Weixin Gong Zong: Wu Yan is silent 0123

The Evolution of AGI General Artificial Intelligence: Multimodal Perception and Multi-task Collaboration

The Evolution of AGI General Artificial Intelligence: Multimodal Perception and Multi-task Collaboration

Read on

【#2024中国AI盛典阵容官宣#】Gather at the AI Festival, watch the audio-visual feast of science and technology, explore the mysteries of artificial intelligence together, and feel the wonderful intersection between the digital world and the real world

Davos Wind Vane: State Grid and Vision on Artificial Intelligence and New Power Systems

Shang Quan Recommend丨Cheng Long: From Method to Topic: The Distinction between Empirical Law and Artificial Intelligence Law

Thomas Sargent : Positivist in the age of artificial intelligence

Artificial intelligence soars to the "C-position" topic: how to ensure that AI technology is at the mercy of humans?

Embracing AI, Data Ape will visit the 2024 World Artificial Intelligence Conference in depth

Center for Strategic and International Studies: "Algorithmic Stability: How Artificial Intelligence Will Affect Future Deterrence"

What are the artificial intelligence index funds - artificial index intelligence funds

AI Creative Conception - AI Creative Design

Talking about how artificial intelligence can transform the semiconductor supply chain, the analyst conference will lead the "core" to the future

Artificial intelligence agents take over the beginning of computer tasks completed by humans

Artificial intelligence companies are building human-like chatbots

AI error reduction takes center stage, and tech giants roll out new tools

The China Green Computing (Artificial Intelligence) Conference opened in Huh, and Wang Lixia delivered a speech

Hongen's Q1 revenue in fiscal year 2024 is 235 million yuan, and artificial intelligence-related modules have been added to thinking products

The 4th "Tianzhi Cup" Artificial Intelligence Challenge is in full swing