When I opened the Wenxin model, I saw that it was all productivity

Reported by the Heart of the Machine

Authors: Zenan, Xiaozhou

Baidu Wenxin pressed the Turbo key.

In recent times, people have been keen to "test" large models.

Whether it's a benchmark score for a machine or a college entrance examination for humans, the progress of technology development is constantly quantified in horizontal assessments. AI that has achieved good results in comparison will be popular with people for a while.

However, in the real world, there are many times when there are no standard answers, and AI will encounter various situations that have not been encountered in training. Furthermore, the application of large models also faces the soul torture of whether it is useful. For large models with rapid technological development, the effect of actual implementation is the most important part of the evaluation ability.

In Lancang County, Yunnan Province, people are using the "Farmer Academician Agent" created based on the Wenxin Intelligent Twin platform to carry out dryland agriculture under the guidance of Academician Zhu Youyong.

When I opened the Wenxin model, I saw that it was all productivity

The new intelligent code assistant Wenxin Quick Code is deeply used by 80% of Baidu's engineers, and the code adoption rate has reached 46%.

Even ancient oracles have been brought to life by AI to be able to speak to us. Click on the oracle, and you can also see the paraphrase generated by the large model.

ALL OF THIS IS THE LATEST TECHNOLOGY DEMONSTRATION AT THE BAIDU WAVE SUMMIT DEEP LEARNING DEVELOPER CONFERENCE YESTERDAY. Baidu is running day and night on the road of "practicality".

Wenxin large model, entering the Turbo era

Two months ago, Wenxin Model 4.0 Tool Edition was released, and today Wenxin Model 4.0 Turbo is released.

Yesterday, Wenxin launched the latest 4.0 Turbo. Based on the Wenxin model 4.0, which was launched in April this year, it has been improved again. The new version is faster and more effective, and the web version and APP of Wenxin Yiyan have been launched one after another, and the API for developers has also been launched.

Feel the speed of 4.0 Turbo and compare it directly with Wenxin Large Model 4.0:

The output is different, and the quality is good, but the difference in speed is very noticeable, and Turbo is too fast.

We tested the Turbo version on the web for the first time. It seems that the large model can perceive the recent news, generate answers much faster than we can read text, and the collated answers are logical and clear, with links to citations at the end.

With a new generation of deep learning platform PaddlePaddle as the technical base for 4.0 Turbo, Baidu has increased the volume of large model training data, optimized the distribution and quality of data, and continuously iterated the training algorithm. On this basis, tuning technologies such as supervised fine-tuning, reinforcement learning with human feedback, and prompt word engineering continue to improve. The unique knowledge enhancement, retrieval enhancement and dialogue enhancement technologies of the Wenxin model have also been improved.

The agent's capabilities have also been enhanced in Wenxin 4.0 Turbo. On the basis of the powerful basic model, further thinking enhancement training is carried out to improve the agent's ability to understand, plan, reflect and evolve. Now the agent of the large model can be reliably executed, self-evolved, and to a certain extent, the thought process can be white-boxed. Through agents, AI can think and act like humans, call on tools to complete complex tasks autonomously, and continue to learn and evolve in the environment.

At present, the Wenxin large model series has a variety of models with different performance volumes, such as Wenxin Lightweight, Wenxin 3.5, Wenxin 4.0, and Wenxin 4.0 Turbo, as well as large model agent technology, which is aimed at developers and covers most scenarios.

Among them, the Wenxin lightweight model is suitable for solving the problem of determining the scene, and has excellent performance and cost performance. Wenxin 3.5 has good versatility and is suitable for daily information processing and text generation tasks; Wenxin 4.0 model is larger and more capable, with stronger comprehension, logical reasoning and richer knowledge, providing professional and in-depth help. Wenxin 4.0 is based on agent technology, which is good at using a variety of tools and data to complete very complex tasks as required.

The newly released Wenxin Large Model 4.0 Turbo can achieve great results and be faster.

The ability of large models is no longer floating on top of code

Application landing is the development trend of large models, and through continuous practice, large models can find a new direction for technological improvement.

At the WAVE SUMMIT, we saw that the large-scale model capability is no longer floating on the code, but can become a meaningful tool in many industries in a down-to-earth way such as "farmer academician agent" and "sports model", creating unprecedented value in practical applications.

In Lancang Lahu Autonomous County in Yunnan Province, rice cultivation used to be a very difficult task due to barren land and frequent natural disasters. In 2015, Zhu Youyong, academician of the Chinese Academy of Engineering, and his team went to Dashan to carry out poverty alleviation through science and technology. Academician Zhu's team teaches the unique planting knowledge of dryland high-quality rice and other crops in the local area. With the efforts of Academician Zhu, local farmers have learned relevant planting techniques, and the level of crop planting has been qualitatively improved.

However, the process of crop cultivation may encounter a variety of specific agricultural production problems, and if Academician Zhu can be asked about planting anytime and anywhere, local farmers will do a better job in planting high-quality rice and other crops in dry land.

In the age of artificial intelligence, this problem is solved by AI.

At the WAVE SUMMIT, Baidu showcased the first agricultural agent jointly created by Academician Zhu Youyong of the Chinese Academy of Engineering and his team with Baidu, the "Farmer Academician Agent". It was created based on the Wenxin intelligent body platform, and learned the research results of Academician Zhu Youyong and related agricultural knowledge. Farmers can ask questions about agricultural production to agents anytime, anywhere, and get professional and detailed answers.

This agent can be used on the web side, App, and small smart devices. We found that in the Wenxin Yiyan APP, we can ask specific questions about crop planting by turning on the "Farmer Academician Agent" function, and get professional answers:

"Academician Zhu is in my mobile phone, exactly like himself", "He answers whatever we ask, just like he is sitting next to me", said a villager in Yunshan Village, Zhutang Township, Lancang Lahu Autonomous County, saying of the "Farmer Academician Intelligent Body".

The "Farmer Academician Agent" has become a powerful knowledge assistant for local villagers. This allows us to see the practical application value of large models in the professional field, and the vision of empowering all walks of life has been concretized at this moment. It is foreseeable that agents with specialized knowledge will become qualified knowledge assistants.

AI can also help athletes achieve better results. Baidu and Shanghai Sport University have explored sports technology, based on advanced AI models, integrated a large number of sports expertise, and built a "sports model", which has realized a series of capabilities such as auxiliary training, technical and tactical analysis, real-time feedback analysis, data collection, posture analysis, and media communication in many sports.

Such AI applications have now covered many national teams such as swimming, track and field, gymnastics, trampoline, rock climbing, etc., and supported training in preparation for major events. Some of the athletes competing in the Paris Olympics have been helped by AI. In addition, the sports model has also played a role in the field of national fitness.

At a time when many companies are still benchmarking and running scores, Baidu gives more convincing figures: Wenxin Yiyan has reached 300 million cumulative users, and the number of daily calls has also reached 500 million, and the average daily number of questions has increased by 78% and the average length of questions has increased by 89% in the past six months.

In Wenxin, people's interest in using large-scale model products is increasing: after the needs of some scenarios are satisfied, people quickly find more scenarios; What started as a simple question-and-answer query question has become a complex set of rules, examples, and more complex tasks that require the large model to complete it.

On the developer side, Wenxin's Galaxy co-creation program has built 550,000 AI native applications, more than 1,000 large model tools, and more than 1,000B high-quality sparse data.

Of course, the value it unlocks can also directly help build engineers, which is reflected in code programming.

Development is speeding up

Baidu's smart code assistant Comate now has a Chinese name "Wenxin Quick Code". As a smart IDE plug-in, it supports 19 major IDEs and more than 100 programming languages.

Chen Yang, vice president of Baidu, said that with the support of large models, Wenxin Quick Code can continue to write existing code, use natural language instructions to generate code, write code according to comments, generate comments on the basis of code, or use private domain knowledge to enhance and fine-tune the model.

Wenxin Express 2.5, released yesterday, covers the entire development process, enhances knowledge, and provides a huge improvement in enterprise-level security.

The name is Kuaicode, and "fast" is mainly reflected in three aspects: fast development, fast business iteration, and fast enterprise landing.

Why is development so much faster? Behind this is the deep understanding and application of AI to R&D knowledge. The experience of hundreds of technical experts, combined with billions of R&D knowledge, has resulted in a development super-assistant with 80% accuracy in generating code.

According to reports, after Baidu's internal use of Wenxin Express, the number of codes submitted by engineers per unit of time has increased by 35%.

Furthermore, the entire development process has been accelerated. It can help you think about when you ask for requirements, write for you during development, help you change during testing and release, and even remind product managers according to internal specifications, and constantly detect security vulnerabilities in the code. Within Baidu, after the implementation of Wenxin Quick Code, the speed of business iteration has increased by 14% end-to-end.

Eventually, this set of tools can be rolled out to more companies. Wenxin Code can provide a complete set of best practices and processes. 80% of Baidu's tens of thousands of engineers are using Wenxin Quick Code in depth, making it the largest team in China to use intelligent code assistants. Externally, Himalaya achieved full implementation in one quarter, with a code adoption rate of 44%.

This code adoption rate may be higher than that of some human programmers. According to reports, there are many customers who have landed Wenxin Express, including Mitsubishi Elevator, iSoftStone, Geely Automobile, etc., with more than 10,000 enterprises and covering thousands of industries.

A paddle chasing the waves

We know that the Wenxin model can continue to evolve rapidly, thanks to Baidu's full-stack layout from chips to frameworks to support models and applications. Among them, the joint optimization of the PaddlePaddle deep learning platform has played a big role.

AT THE WAVE SUMMIT, BAIDU UNVEILED A NEXT-GENERATION AI FRAMEWORK, PADDLE FRAMEWORK 3.0, WHICH IS NOW AVAILABLE TO DEVELOPERS.

In the new version of the design, Baidu fully considers the current trend of large-scale model development and the hardware system of heterogeneous multi-chip. The new version of the framework has integrated capabilities for large model training and inference, emphasizes the automatic parallelism in large model training and development, realizes the automatic optimization of the compiler, simplifies the process of development and tuning, and completes the multi-hardware adaptation of large models.

In order to realize the above technical advantages, starting from the requirements of integrated training and pushing, a high-expansion intermediate representation PIR was designed at the bottom layer, and an efficient and flexible Pass mechanism was constructed, which reduced the development cost by 58% and accelerated the model inference of 84% of the PaddlePaddle model library by more than 10%.

As we all know, the development of large model hybrid parallelism is very complex, involving hybrid parallelism, communication, and scheduling strategies. In order to simplify this work, Baidu has developed automatic parallelism capabilities to better encapsulate code development, achieve global static optimization, and further improve the performance ceiling. With the help of the dynamic and static unified automatic parallelism capability of the paddle, the training performance of models with different parameters can be improved by up to 20%.

Performance optimization is an important attribute for AI frameworks. Combined with the design of the compiler, PaddlePaddle can greatly simplify the optimization process. Do a good job of the corresponding compiler representation on the front-end, and convert the front-end representation into the underlying representation on the back-end to connect with the hardware to realize automatic optimization of the code. The automatic fusion of operators through the compiler will be 4 times faster than that of operator calls and 14% faster than manual fusion. Through this series of compilation performance optimizations, the inference performance of generative models, whether it is a language model or a diffusion model, has been significantly improved, with an increase of up to 30%.

In the design process of large models, the integration of training and pushing is very important. For example, PaddlePaddle can automatically convert dynamic graphs into static graphs, and the compressed inference of training is seamlessly connected. By invoking high-performance operators, RLHF can be accelerated by 2.1 times. In addition, the quantization process can reuse distributed strategies, resulting in a 3.8x increase in quantization efficiency.

With more than 30 interfaces, PaddlePaddle can fully support the training and inference of large models. Hardware manufacturers only need to adapt the basic operator to access, which greatly reduces the workload. In addition, it also makes efforts in software and hardware collaborative optimization to better achieve collaborative performance optimization.

The paddle platform is of great significance to the large model, and many of the capabilities of the Wenxin large model can only be realized by joint optimization with the paddle. It's like the relationship between a boat and an oar.

In terms of basic computational optimization, PaddlePaddle realizes the refined recomputation of the attention calculation of the block sparse mask and the optimal balance of storage and computing in the training of the model, and realizes the parallel of flexible batch virtual flow and the hybrid parallel of multi-model structure in the distributed expansion. In addition, communication with hardware has been jointly optimized.

In terms of inference, LoRA is intensively deployed through high-performance segmentation matrices and multi-stream accelerated computing to obtain more extreme inference efficiency. With the same precision, the inference performance of LoRA can be improved by 33.3%. After quantization, the performance can be improved by 113.3%, and the number of LoRAS supported can reach 6 times.

PaddlePaddle also realizes the hybrid deployment of heterogeneous multi-core, which can be dynamically scheduled, and different requests are allocated to chips with different performance to maximize the efficiency of resource utilization.

A set of numbers: 14.65 million developers, 370,000 enterprises and institutions, and 950,000 models, this is a glimpse of the ecology built by PaddlePaddle Wenxin.

From computing power, framework, model to practice, this set of China's first large-scale model full-link ecological support system continues to play a role in this global large-scale model competition. The end of other people's efforts is only the starting point of Baidu.

Artificial general intelligence, the dawn has appeared

2024 is about to pass the halfway point, and the "100 model war" has been going on for more than a year, where is the industry developing? Which volume will you go to?

At yesterday's conference, Wang Haifeng, chief technology officer of Baidu and director of the National Engineering Research Center for Deep Learning Technology and Application, interpreted the development of AGI from two perspectives: technology versatility and capability comprehensiveness.

The first is the versatility of technology. Artificial intelligence technology has undergone decades of development, and in the era of large models, a set of architectures and technologies can already solve various problems. In addition to algorithms, models have also become more versatile and unified. Different tasks, languages, scenarios, or modalities can all be handled with the same basic model.

Taking natural language processing as an example, there used to be many sub-directions such as word segmentation, syntactic analysis, semantic matching, machine translation, question answering, dialogue, etc., but now a large language model can solve most of the tasks. In terms of language, the large model can not only solve the problem of single language, but also cross languages, not only learning the natural language of human beings, but also learning the formal language defined by artificially, building a bridge from thinking to execution. At the same time, the large model can also realize multi-modal unified modeling, which can widely empower applications in all walks of life. In general, AI technology is becoming more and more versatile.

Then there's the comprehensiveness of the capability. Comprehension, generation, logic, and memory are the four basic abilities of artificial intelligence, and the typical abilities of artificial intelligence, such as creation, problem solving, coding, planning, and decision-making, are basically the comprehensive application of these four basic capabilities. The stronger these four capabilities are, the closer they are to general artificial intelligence.

However, if you want to use this general technology to achieve comprehensive capabilities, not everyone can play with it.

Due to the high requirements of large models for talents, computing power, and data, the pattern of technology competition is gradually becoming clear under the high intensity of competition. From start-ups to large factories, the leading echelon has taken the lead.

Taking it a step further, players who have truly built a complete AI technology system need to face the use cases head-on and create real-world applications that can drive productivity. Compared with the development of technology, the challenges of technology implementation may be more and greater.

Two weeks ago, the news that Microsoft's Copilot GPTs would be discontinued sparked concern in the industry: after only three months of opening, the technology application with many users announced its retirement due to a "strategic adjustment of the company". The reason for this is that the scenario is not clear and the lack of commercial returns are all possible factors.

Recently, there were media reports that OpenAI's revenue from selling large model capabilities such as GPT-4 has surpassed the revenue of Microsoft, which it backs, in similar businesses.

No matter how advanced the large model technology is and how the cost per token is reduced, even if it comes from top technology giants, AI applications that cannot embrace the scene ecology will still be quickly eliminated. Even companies like Microsoft face challenges.

And embracing the scene may be what domestic technology companies are good at.

Seeing the development and implementation of Wenxin Yiyan, we can already say that the "AI has entered the stage of industrial mass production" shouted at the first Wave Summit in 2019 has become a reality step by step. As large models enter the industrial explosion period, general artificial intelligence is accelerating.

When I opened the Wenxin model, I saw that it was all productivity

Read on

Decoding the artificial intelligence model, this summer science and technology innovation summer camp is full of dry goods

The teachers and students of Trinity Vocational College made a 2-meter-high "Transformers" model by hand

Long Text vs RAG: Who Will Dominate the Big Model Future?

Alibaba Cloud PAI large language model fine-tuning training practice

The teams of Zhejiang University and Tencent released a large-scale evaluation benchmark for scientific LLM, and the domestic large-scale model performed well

Published in Nature, Topological Transformer Model Multiscale Protein-Ligand Interaction Prediction

Zhihu AI user model service performance optimization practice

Seventy-three years ago, Shannon had planted a seed for the development of large-scale models

【Scientific Reports】张静团队开发首个乳腺癌患者认知障碍的预测模型

Model test study of the pressure change law of surrounding rock during the excavation process of construction shaft

Large model technology has become the key, and iFLYTEK Xinghuo V4.0 is favored by enterprises

The large model accelerates out of the "dialog box" and goes deep into the industry

In the era of "intelligence", how to promote the home appliance industry (2) - vertical domain model

After the craze of AI large models, the deep cultivation of "Zhihu direct answer" is used as the pen to draw a galaxy of knowledge exploration

Dialogue with Tencent Tang Daosheng: AI is more than just large models

Microsoft open-source GraphRAG: Greatly enhances large model Q&A, summarization, and inference