July 6, 2024 - The World Artificial Intelligence Conference (WAIC 2024) entered its third day today, with the agenda shifting to the practical applications and challenges of AI technology in urban management and public services. The conference brought together experts, scholars and industry elites from all over the world to discuss the future direction of artificial intelligence and its impact on society.
As one of the invited media, Data Ape still participated in this grand event, and felt that top experts and scholars from all walks of life presented their latest research results at the exhibition.
Unknown boundaries
The 2024 World Artificial Intelligence Conference "Unknown Boundary" Large Model Exploration Future Forum will bring together outstanding young scholars from around the world to discuss the frontier issues and future trends of large model technology, especially in the fields of multilingual processing, graph learning, intelligent education, language agents, GUI agents, etc.
The forum focuses on five directions of large models: large language models, multimodal large models, application frameworks, innovative applications, alignment evaluation, and fine-tuning.
Gui Tao, an associate researcher at Fudan University, proposed an agent synthesis framework based on large language models, which aims to solve the problem of alignment between agents and human capabilities and values. Gui's research focuses on ensuring that these large model-driven agents are consistent with human values and avoid behaviors that are contrary to human social norms.
In Gui's speech, he delved into the strategy of aligning agents with humans, that is, how to make agents not only capable of performing tasks efficiently, but also understanding and following the moral and ethical standards of human society. This involves how agents learn, how they make decisions, and how they interact with humans. Gui Tao also shared his insights into the future development of agents, pointing out that with the advancement of large model technology, agents will get closer and closer to the level of human intelligence, but at the same time, they will also encounter more challenges, such as how to deal with complex moral dilemmas and how to make reasonable judgments without explicit instructions.
Gui Tao's speech reflected the important position of large model technology in the development of agents, and also emphasized the importance of alignment evaluation, that is, to ensure that the behavior of agents is always in line with human values and social norms. This is not only a technical challenge, but also involves the intersection of multiple disciplines such as philosophy, psychology and ethics. Gui Tao's sharing provides a new direction for agent research, prompting researchers to pay more attention to the controllability and morality of their behaviors while pursuing the improvement of agent capabilities, so as to realize the harmonious coexistence of artificial intelligence and human society in the true sense.
Pictured: Gui Tao, an associate researcher at Fudan University
The developer team of the Sailor large model, Singapore's Sea AI Lab, which has exploded on Github, also came to WAIC. Liu Qian, one of the team members, used the "Sailor" project as an example to share the complexity and diversity faced when working with multilingual data, and how to improve the efficiency and performance of model training through effective strategies.
Liu Qian emphasized that the training of multilingual large models is a complex process, which needs to deal with challenges such as differences in grammatical structures, different vocabulary sizes, and diversity of cultural backgrounds among different languages. To overcome these difficulties, he proposed a series of strategies, including the use of a unified way of coding, the introduction of cross-language pre-training methods, the use of large-scale corpora for training, and the use of technical means such as transfer learning. These strategies help improve the adaptability and generalization of the model to different languages, allowing the model to better understand and generate content in multiple languages.
By analyzing the latest progress of the "Sailor" project, Liu Qian revealed how to use these strategies to effectively improve the training efficiency and performance of the model, thus providing valuable practical experience for building high-quality multilingual large models. He also mentioned some successful experiences and challenges encountered in the training process of multilingual large models, such as how to balance the size and quality of multilingual datasets, how to solve the training bottleneck of low-resource languages, and how to ensure the consistency and accuracy of the model across different languages.
Pictured: Liu Qian, a member of the Sea AI Lab team in Singapore
Huang Chao, Assistant Professor of the University of Hong Kong, gave a speech on the topic of "Exploring the Power of Large Language Models (LLMs) in Graph Learning". His speech focused on how to use large language models (LLMs) to enhance graph learning capabilities, especially emphasizing the unique advantages of LLMs in processing graph data and their innovative applications in social network analysis, recommender systems, and other fields.
Huang Chao first introduced the application of large language models in graph learning, and explained how they can improve data processing capabilities by understanding complex graph structures. Large language models can capture the complex relationships between nodes in a graph, which makes them excellent for tasks such as interpersonal relationships in social networks, prediction of user preferences in product recommendation systems, and more. By combining the contextual understanding and generation capabilities of large language models, graph learning can more effectively mine implicit patterns in graph data, thereby improving decision-making and prediction accuracy.
Photo: Huang Chao, Assistant Professor, University of Hong Kong
Yu Jifan, an assistant researcher at Tsinghua University's Institute of Education, said that the intelligent education environment should not only stay at the stage of simply applying intelligent tools, but should move towards a higher level of understanding and application of intelligent capabilities. This involves leveraging the deep learning capabilities of large models to create customized learning paths for students that adapt to each student's individual learning style and needs. In this way, the large model is able to provide immediate feedback to help students better grasp knowledge, while monitoring their cognitive development to ensure that the learning process is effective and targeted.
Photo: Yu Jifan, assistant researcher at the Institute of Education, Tsinghua University
In an intelligent education environment, large language models can play a variety of roles, such as virtual tutors, learning resource recommenders, and learning progress trackers. They are able to adapt the teaching content to the student's learning history and performance, and even anticipate the learning difficulties that the student may encounter, providing assistance in advance. This highly personalized approach to learning helps to stimulate students' interest in learning and improve learning efficiency, while also reducing the workload of teachers.
After that, Yu Gu, a Ph.D. student at Ohio State University, elaborated on a new evaluation system in her report, aiming to solve the complex problems brought by multimodal large models to language agents. He deeply analyzed the current status of multimodal language agents and put forward unique insights on the future development route. Multimodal language agents can understand, process, and generate multiple forms of information such as text, speech, and images, which puts forward new requirements for traditional single-modal models. Gu Yu's framework considers the fusion of multimodal data, cross-modal reasoning, and the interaction between agents and human users, in order to improve the understanding and response capabilities of agents in complex situations.
Gu Yu's presentation highlighted the difficulty of multimodal data processing and how to overcome these obstacles during the training and evaluation phases. He pointed out that existing evaluation methods may not be sufficient to measure the overall performance of multimodal agents, so innovative evaluation strategies are needed. Gu Yu's evaluation framework aims to comprehensively consider the performance of agents in a multimodal environment, including semantic understanding, emotion recognition, situational awareness and other capabilities, which is essential to promote the practicability of multimodal agents.
Pictured: Yu Gu, a Ph.D. student at The Ohio State University
Through Gu Yu's speech, the audience was able to gain an in-depth understanding of the latest advances in the theory and practice of multimodal language agents, and how to further improve the performance of agents in real-world applications by optimizing evaluation methods. This research direction is not only of great significance to the academic community, but will also have a profound impact on many fields such as human-computer interaction, customer service, and smart home, indicating that in the future, intelligent agents will be able to communicate and collaborate with humans more naturally and intelligently.
Industry pioneer
The implementation of AI is inseparable from the research of experts and scholars, but in the same way, the application of AI by enterprises is also the top priority.
Lianhui Technology: The second generation of multimodal agents
At WAIC, Lianhui Technology officially launched the second-generation multimodal agent OmAgent and Om multi-modal agent product series, creating a "super agent assistant" for industry users. OmAgent deeply integrates the comprehensive perception of the large model OmDet V2 and the thinking and decision-making capabilities of the large model OmChat V2, in which OmDet V2 achieves a perception speed improvement of more than 20 times through EFH high-performance fusion head and a series of optimization technologies. OmChat V2 supports a context length of up to 512K, can handle complex video, image-text mixing and other inputs, and has excellent timing relationship judgment and multi-graph relationship understanding capabilities.
The new product series of OM multimodal agents include space operation agents and knowledge service agents, which comprehensively perceive the physical environment through IoT devices to achieve refined space management. Knowledge service agents focus on digital asset management, improve decision-making quality and operational efficiency, and Om multimodal intelligence can be widely used in many industry scenarios.
Kingdee: Redefining the financial framework
At the "Intelligent Finance" forum of the 2024 World Artificial Intelligence Conference, Kingdee shared the reform and unchanged of financial management in the AI era, Kingdee's own AI layout, and the successful cases of AI applications of some customers.
The redefinition of the financial management framework from gyroscope to hourglass means that the investment in the records system is reduced, and the investment in the combat system and support system is increased, which will help enterprises create more value in their core business. The application of AI technology has enabled financial forecasting to shift from relying on experience to accurate forecasting, and with the help of big data and deep learning, financial forecasting has become more accurate and can quickly adapt to market changes. Financial management information has shifted from exclusive to inclusive, and AI assistants have made it easy to obtain information. Expert services have changed from individual elites to AI teams, that is, a team composed of AI that can provide cross-domain comprehensive services. The focus of external reporting has shifted from financial indicators to development capability evaluation, and AI can help generate ESG reports and evaluate corporate development capabilities.
Enterprises have upgraded from the traditional financial system to the "AI + Finance" intelligent platform, and the AI assistant can assist decision-making at any time and improve efficiency. Finance professionals need to shift from AI watchers to embracers, and Kingdee has prioritized AI as a strategy to empower product innovation.
Bilibili: To be the largest AIGC community in China
As the most influential community in China's AI field, Bilibili comprehensively demonstrated its achievements in AI content ecology, AIGC creation, and AI technology research and development at the 2024 World Artificial Intelligence Conference (WAIC 2024). Bilibili has a wide range of AI content coverage, involving model evaluation, training, audio and video, popular science, industry dynamics and algorithm discussions, attracting a large number of young users, especially the post-00s, who constitute the main force of AI content consumption, with an average daily playback volume of more than 80% year-on-year.
Bilibili is not only a platform for AI knowledge acquisition, but also a hot spot for AIGC (AI Generated Content) creation. UP owners use AI technology to carry out innovative creations, such as voice cloning, image generation, and virtual human creation, which promotes the rapid development of the AIGC ecosystem. Bilibili has stimulated users' enthusiasm for participating in AIGC creation by holding creative activities and competitions, such as AI video and music competitions.
Gechuang Dongzhi: an industrial operating system in the era of large models
Picture: Yang Li, Marketing Director of Gechuang Dongzhi
As a leading enterprise in China's industrial AI, Yang Li, Marketing Director of Gechuang Dongzhi, shared her insights on industrial AI and new quality productivity at the conference, and discussed the application prospects of AI in the manufacturing industry with industry colleagues. Gechuang Dongzhi focuses on using AI, big data and Internet of Things technologies to redefine the industrial operating system, and promote the transformation of the manufacturing industry to intelligence through AI-driven industrial intelligence solutions to improve production efficiency and product quality.
Gechuang Dongzhi's one-stop data intelligence solution includes nine AI application functions such as yield monitoring, abnormal positioning, prediction and early warning, which can fully cover all aspects of the manufacturing industry, from R&D to after-sales service, and realize intelligent production through data models, so as to achieve the optimal balance between cost and yield. In specific cases, for example, TCL Huaxing has achieved automation, data and intelligent upgrading through Gechuang Dongzhi's industrial intelligence platform, and has been rated as one of the first batch of "digital pilot" enterprises in the country. Another semiconductor wafer fab has also effectively reduced manpower and loss costs through the AI services provided by Gechuang Dongzhi.
Collision of ideas
The development of artificial intelligence is essentially a feast of ideas across the three worlds of enterprises, developers and scholars. In this arena of innovation and exploration, enterprises lead the commercialization process of technology with their keen market insight and abundant resource investment, transforming abstract concepts into products and services at their fingertips. Developers are the core driving force in this process, and they continue to optimize algorithms and improve functions with superb technical capabilities and a deep understanding of user needs, making AI systems more intelligent and humane. Scholars are at the forefront of theoretical research, revealing the internal mechanism of AI through rigorous experiments and profound analysis, proposing new algorithm models, and laying a solid scientific foundation for technological progress. Through frequent exchanges and cooperation, the three have realized the sharing of knowledge and the collision of inspiration, and jointly drawn a colorful picture in the field of artificial intelligence. Every cross-border dialogue may give birth to disruptive innovation; Each in-depth collaboration has the potential to open up new research directions. It is this kind of diversified interaction and integration that is driving the development of AI technology at an unprecedented speed and constantly expanding the boundaries of human cognition and capabilities.