In the past year, embodied intelligence has undoubtedly become one of the hard technology tracks that the industry, academia and investment circles at home and abroad have focused on and invested in. The continuous heavy investment of resources from all parties has brought the development of the industry into the fast lane, and various furniture and body intelligent enterprises have successively released new product progress, and different start-ups have successively obtained early financing of 100 million yuan.
On September 25, the 2024 Baidu Cloud Intelligence Conference "Embodied Intelligence Special Forum" was successfully concluded at the Zhongguancun International Innovation Center in Beijing.
This is the first time that Baidu Intelligent Cloud has set up a sub-forum on the topic of embodied intelligence in the annual "Cloud Intelligence Conference", in which experts and scholars from all walks of life have shared and talked about the overall development of embodied intelligence, key technical issues and valuable practices of their own enterprises.
The scene of the special forum
In the organizer's sharing session, Baidu Intelligent Cloud shared some of their insights and practices on the embodied intelligent track in the past year, and officially released and introduced the embodied intelligent track solution to the outside world, fully sharing their thinking on the embodied intelligent track as a cloud vendor and the mode of participating in promoting the development of the track.
In the theme sharing session, the conference invited Liang Xiaodan, associate professor and doctoral supervisor of the School of Intelligent Engineering of Sun Yat-sen University, and Mao Yichen, executive general manager of CICC Capital, to interpret the development status of the embodied intelligence track. In addition, Gao Yang, Ph.D. supervisor of the Institute for Interdisciplinary Information Sciences, Tsinghua University/director of the Visual and Embodied Intelligence Laboratory of Tsinghua University/co-founder of Qianxun Intelligence, Ju Xiaozhu, head of the large model of Beijing Embodied Intelligent Robot Innovation Center, Gao Jiyang, co-founder and CEO of Xinghaitu, and Li Yuqian, head of the robotics business of NVIDIA China, also shared important topics.
In the roundtable dialogue session, the moderators were Ke Di, senior investment manager of BV Baidu Ventures, Wang Qian, founder and CEO of X Square, Shang Hang, assistant professor/researcher/doctoral supervisor of the School of Computer Science of Peking University, Zhang Zhizheng, partner/head of large model of Beijing Galaxy General Robot Co., Ltd., and UniX Yang Fengyu, founder and CEO of AI, and Zeng Guoyang, co-founder and CTO of Facewall Intelligence, conducted wonderful dialogues and in-depth discussions on a number of important technical issues related to embodied intelligence and application prospects.
▍Baidu Intelligent Cloud works with all parties in the track to accelerate the evolution of new species of embodied intelligence
Zhang Wei, general manager of Baidu's intelligent cloud pan technology business department, first delivered a speech, sharing some thoughts and judgments that the team has invested in the embodied intelligence track since last year. He shared the team's decision-making and judgment thinking on the timing of the investment in the embodied intelligent track last year, combined with external factors such as technology and policy, as well as the phased development characteristics of the track, and clarified that the ecological value positioning of Baidu Intelligent Cloud in the track is mainly to cooperate with multiple partners to serve the whole machine manufacturers and enterprises of the track, help manufacturers accelerate the construction of a series of core capabilities of embodied intelligent robot products, and help accelerate the process of product landing.
Zhang Wei, general manager of Baidu Intelligent Cloud Pan Technology Business Department
▍Experts from academia and investment circles shared in depth the overall technology and market development of the embodied intelligence track
Liang Xiaodan, associate professor and doctoral supervisor of the School of Intelligent Engineering of Sun Yat-sen University, deeply analyzed the development status of key technologies of embodied intelligence and shared views on the follow-up technology development trends. Liang Xiaodan pointed out that as a highly integrated system, the development of embodied intelligence is inseparable from the collaboration of hardware, algorithms and data, and comprehensively and systematically shares the overall framework of embodied intelligence and key technologies in all directions. In addition, he also shared some of the work results that have been made on open source data, open source simulation and open source models, emphasizing the core position of data in the development of embodied intelligence, and pointing out that high-quality and diverse datasets are essential for training high-performance embodied intelligence models.
Liang Xiaodan, associate professor and doctoral supervisor of the School of Intelligent Engineering, Sun Yat-sen University
Mao Yichen, Executive General Manager of CICC Capital, also shared the background of the rapid development of the embodied intelligence track, the industrial chain and the subsequent development trend. Mao Yichen emphasized the importance of the close integration of technology, hardware and scenario applications to promote the development of embodied intelligence. She believes that although the current industry is still facing challenges such as lack of data and high hardware costs, with the advancement of technology and the gradual improvement of the ecological chain, embodied intelligence will show great business potential in many fields such as industrial manufacturing and business services. In particular, the Chinese market, with its huge demand base, rich application scenarios and government support policies, is expected to achieve corner overtaking in the field of embodied intelligence.
Mao Yichen, Executive General Manager of CICC Capital
▍Baidu Intelligent Cloud is committed to helping track enterprises build the core capabilities of embodied intelligent robot products in an all-round way
At the Embodied Intelligence Special Forum set up for the first time at this Cloud Intelligence Conference, the organizer released and introduced its embodied intelligence track solution. Zhang Longjun, the person in charge of the embodied intelligence track of Baidu Intelligent Cloud Pan Technology Industry, first added some internal and external factors when the team decided to invest in the track last year, and elaborated on the current core key tasks of the track enterprises, explaining that the ecological positioning of Baidu Intelligent Cloud's current participation in the track is to help track manufacturers and enterprises accelerate the construction of some key core capabilities of embodied intelligent robot products.
Based on the ecological positioning of participating in the development of the track, Zhang Longjun introduced Baidu's intelligent cloud embodied intelligent track solution, as well as the promotion of cooperation with track enterprises in different cooperation directions. The solution focuses on helping track manufacturers better solve a series of challenges faced by product landing, combined with the advantages of Baidu Intelligent Cloud's technology and product solutions, the corresponding directions that can provide empowerment mainly include:
1) Provide industry-leading Wenxin large model, partner facewall intelligence's end-to-end large model, and Baidu Intelligent Cloud Qianfan large model service and development platform to help build an embodied brain;
2) Provide a cloud-based simulation platform to help enterprises accelerate the iteration of cerebellar operation and control algorithm training and the expansion of ecological developers;
3) Provide professional and large-scale data collection and data annotation services to help the construction of embodied intelligent datasets;
4) Provide far-field voice interaction solutions to help the whole product build an excellent human-machine voice interaction experience;
5) Provide a security solution that integrates cloud, pipe, and end to escort the whole cycle of products from R&D to landing and operation;
6) Provide high-performance, stable, and reliable AI computing power on the cloud and an AI Infra technology platform to support the efficient training of various models.
Zhang Longjun, head of the embodied intelligence track of Baidu's intelligent cloud pan-technology industry
▍Thematic sharing: Carry out in-depth sharing on the key operational capabilities, dataset construction, landing technology framework and simulation platform of embodied intelligence
In the thematic sharing session, first of all, the core ability that directly determines whether the embodied intelligent robot product is "useful" - "operation" was introduced. Gao Yang, doctoral supervisor of the Institute for Interdisciplinary Information Sciences of Tsinghua University, director of the Visual and Embodied Intelligence Laboratory of Tsinghua University, and co-founder of Qianxun Intelligence, gave an in-depth sharing on the "research and practice of embodied operation framework and operational skills learning". Gao Yang pointed out that in the field of robotics, the ability to operate in complex environments is one of the key challenges, and he introduced the embodied intelligent manipulation framework CoPa proposed by his research team earlier, which greatly improves the robot's ability to operate in open-world scenarios by leveraging the common-sense knowledge embedded in the basic model. In addition, Gao Yang also introduced two technical research results of General Flow and Embodied Agent Efficient Learning, which can enable robots to learn from human operation videos and migrate to new tasks, effectively improving the robot's autonomous learning ability.
At the end of the topic sharing, Gao Yang demonstrated the excellent continuous long-range task execution ability of the robot under development that Qianxun Intelligence recently disclosed to the public through a demo video. In addition, he also revealed that the current Qianxun intelligent AI technology team is expanding rapidly, and welcomes technical talents to join.
Gao Yang, Ph.D. supervisor of the Institute for Interdisciplinary Information Sciences, Tsinghua University, director of the Visual and Embodied Intelligence Laboratory of Tsinghua University, and co-founder of Qianxun Intelligence
Embodied datasets are one of the core drivers of the technological advancement of embodied intelligence. At present, the Beijing Embodied Intelligent Robot Innovation Center is working with all parties in the industrial chain to build "the largest, most densely informed, and most versatile high-quality embodied intelligent dataset of embodied intelligent robots".
Ju Xiaozhu introduced the thinking of the Innovation Center on the construction of datasets that hopes to take into account industrial research and academic research, and the limitations of existing datasets in the industry, focusing on the comprehensive layout and practice of the Innovation Center in datasets, data machines, data applications and data platforms. Through Dr. Ju's introduction, the audience was able to get a glimpse of the first provincial-level humanoid robot innovation center established in China, which has in-depth and leading work in the construction of embodied intelligent datasets.
At the end of the sharing, Ju Xiaozhu also introduced the situation of working closely with Baidu Intelligent Cloud to promote data collection, and quickly promote the large-scale collection of high-quality real data and simulation data on multiple types of ontologies such as humanoid robots and robotic arms.
Ju Xiaozhu, head of the large model of Beijing Embodied Intelligent Robot Innovation Center
Gao Jiyang, co-founder and CEO of Xinghaitu, combined with the team's leading accumulation of embodied intelligent perception and operation algorithms, cutting-edge technology product landing capabilities, and valuable experience in large-scale mass production of autonomous driving, focused on "Insight into the implementation of embodied intelligent technology and the closed-loop elements of product business". Gao Jiyang mentioned in the sharing that Xinghaitu is currently focusing on building a "one brain and multiple forms" embodied intelligent robot, and has deployed full-stack self-research in embodied ontology, end-to-end AI algorithms, and scenario solutions. In terms of product design, Xinghaitu follows the concept of "intelligence defines the ontology", that is, the robot ontology is designed around the needs and boundaries of intelligence, rather than starting from the structure.
In addition, Gao Jiyang also shared his deep insights into the "marginal cost of intelligence", believing that the marginal cost of intelligence determines the competitiveness of the company. Through the self-developed full-scale embodied intelligent ontology and core components, combined with simulation and real data, Xinghaitu is committed to reducing the learning cost of new tasks, that is, data costs, so as to promote the rapid iteration and commercialization process of products.
Gao Jiyang, co-founder and CEO of Xinghaitu
NVIDIA's Isaac platform is widely used by robotics developers, and several major updates for humanoid robots were announced at GTC 2024. Li Yuqian, head of NVIDIA's robotics business in China, introduced NVIDIA's layout in the robotics industry, the advantages of the Isaac platform, and NVIDIA's strategy and ecological cooperation work in accelerating the development of embodied intelligent applications.
Li Yuqian said that NVIDIA's current three major layouts in the robotics industry include: training, simulation and runtime. In particular, she mentioned the Isaac platform, which includes Isaac Sim, a robotics simulation platform, and Isaac Lab, a reinforcement learning training platform. By showcasing a series of use cases, such as synthetic datasets, reinforcement learning training, and humanoid robots, Li Yuqian demonstrated the unique advantages of the Isaac platform in improving the efficiency and performance of robot AI function development.
In addition, Li Yuqian further shared NVIDIA's strategy to accelerate the development of embodied intelligent applications. She detailed the pre-trained models and toolchains provided by NVIDIA, which can help developers quickly build intelligent robot applications. In addition, NVIDIA is working with partners such as Baidu Intelligent Cloud to promote the deployment of cloud-based simulation services to provide a more convenient platform to accelerate the development of embodied intelligence applications.
Li Yuqian, head of the robotics business of NVIDIA China
▍Roundtable Dialogue: Carry out diversified and in-depth dialogues and imaginations around the technical issues and landing prospects of embodied intelligence
The roundtable dialogue session was moderated by Ke Di, senior investment manager of BV Baidu Venture Capital, who has rich practical experience in the embodied intelligence track, and the dialogue guests were composed of five senior technical experts and entrepreneurs with diverse research directions, including: Wang Qian, founder and CEO of X Square, Shang Hang, assistant professor/researcher/doctoral supervisor of the School of Computer Science of Peking University, Zhang Zhizheng, partner/head of large model of Beijing Galaxy General Robot Co., Ltd., Yang Fengyu, founder and CEO of UniX AI, Zeng Guoyang, co-founder and CTO of Facewall Intelligence.
Roundtable panel of experts
BV Baidu Venture Capital Senior Investment Manager Kedi
On the topic of the change of large models to research paradigms, Zhang Zhizheng believes that since the emergence of large models, traditional methods need to be rethought because of their versatility and generalization. He emphasized that the research paradigm is gradually changing from the development of specific control algorithms for a single task in the past to the training and system construction based on large models, which has a huge impact on the field of robotics, prompting developers to pay more attention to the closed loop of data and models, and how to deploy and apply large models in specific scenarios.
Zhang Zhizheng, partner of Beijing Galaxy General Robot Co., Ltd
In terms of the possibility and challenges of skill emergence, Wang Qian pointed out that there are two paths for skill emergence, the first is the mutation that occurs in the process of model generalization improvement, and the generalization ability of most models is generally limited to adapting to simple physical environment and hardware configuration changes, as well as operating untrained new objects, and generalization can be improved to the time when autonomous execution of new tasks that have never been demonstrated, which can be called skill emergence. He shared that X Square has achieved a training practice of advanced generalization and skill emergence that has never been reported before through the improvement of the model's basic capabilities, and he believes that this ability can only be achieved through end-to-end unified basic model, through sufficiently diverse data and scenario training, and the general structure of physical laws and skill strategies. The second path to skill emergence, which involves the introduction of a try-assess-correct chain of thought competency. Unlike language tasks that GPT-4 excels at, implementing such capabilities in embodied intelligence tasks requires additional models to accurately assess state transitions, so the world model will be a key component on this path. Zhang Zhizheng shared that Galaxy General has observed some skill emergence phenomena that have not been seen in training through large-scale simulation synthetic data training models, and emphasized the importance of simulation data in skill emergence research.
X Square 创始人兼CEO 王潜
In terms of the importance and application of the world model, Wang Qian shared the practice of X Square in training the world model and applying it to embodied intelligence tasks, and believed that in the short term, different task domains need different focus of model prediction, so a variety of different world models are needed, and the future unified comprehensive world model is a possible direction beyond human ability.
Yang Fengyu believes that the ability of the world model to learn the internal representations of the environment and predict the future state is crucial to realizing the data loop of AI. He emphasized that the world model can generate not only data, but also strategies and actions to drive end-to-end solutions to embodied intelligence tasks.
Yang Fengyu also pointed out that there is a long way to go to build a world model, and generation and simulation are two different paths. Simulation has its advantages and inevitable drawbacks, and it does not necessarily follow the laws of the physical world. Secondly, after understanding the laws of change in the real physical world, how to do it in the neural network and the algorithm must be a very long-term process. Today, human beings have not yet fully explored the laws of the full operation of the world, and we can only use prior knowledge and neural networks to continuously approach more and more real and comprehensive world models. The future model of a unified world will be beyond human imagination, which is also the source of power that drives countless people to work hard and strive for it. He also said, "UniX AI is now mainly training humanoid robots through the collection of real data, and is making very good progress based on the unique visual and tactile underlying model Unitouch to guide the operation of the robot. UniX AI will speed up the work of data collection and make the application of robots in complex scenarios such as the home more comprehensive. ”
UniX AI创始人兼CEO 杨丰瑜
Zeng Guoyang further pointed out that the core of the world model lies in the modeling and understanding of world changes, and although there are still great challenges, it is of revolutionary significance for the development of AI in the future.
Zeng Guoyang, co-founder and CTO of Facewall Intelligence
During the discussion, the panelists also expressed their views on the importance of data in the research and development of embodied intelligence. Zhang Zhizheng emphasized that in the stage of building the embodied basic model, the proportion of simulation data used is as high as more than 90%, so as to quickly and effectively scale up the amount of data to the magnitude required by the large model.
Wang Qian believes that simulation is a cheap and easy-to-annotate data source for high-level decision-making, navigation and other tasks, but real-world data is particularly important for achieving high-precision operation tasks. At the same time, due to the different efficiency of the two types of data for training models, with the decline of the cost of real-world data, the comprehensive cost of the tasks represented by general fine operations is expected to be equal to or even lower than that of simulation data.
Shang Hang, Assistant Professor, Researcher, Doctoral Supervisor, and Liberal Arts Young Scholar, School of Computer Science, Peking University
At the end of the roundtable dialogue, the panelists shared their visions for the future and the challenges they faced. He introduced the paradigm change of artificial intelligence research in the past ten years, expressed his expectations for general artificial intelligence with embodied intelligence as the carrier, and pointed out that the current research on embodied intelligence still needs to go hand in hand in the closed loop of data models. Zeng Guoyang dreams of creating an intelligent assistant like Jarvis in "Iron Man" to fully expand human capabilities. Yang Fengyu emphasized the application potential of companion robots in education, medical and other fields. Wang Qian elaborated on the requirements of large models for data diversity from a technical perspective. Zhang Zhizheng discussed the challenges and opportunities in the process of building a data flywheel from the perspective of commercialization.
▍Conclusion and outlook
The 2024 Baidu Cloud Intelligence Conference "Embodied Intelligence Special Forum" has built an important platform for multi-party exchanges and collision of ideas for various participants in the industry, academia and investment circles, and has been a complete success.
It is believed that this forum will become an important stop for all parties to participate in the process of the Age of Embodied Intelligence, and take this as a new starting point to continue to work together to accelerate the evolution of new species of embodied intelligence, and accelerate the sea of stars of embodied intelligence!