Zhang Peng of Zhipu AI: Chinese entrepreneurs should also do 0 to 1 innovation, and the basic capabilities of large models are comparable to GPT-4

Author: Ye Hao

Edited by Kang Xiao

Produced by丨Deep Web Tencent News Xiaoman Studio

In the second half of 2023, after participating in a number of domestic large-scale model industry seminars, Zhang Peng, CEO of Zhipu AI, strongly realized that Chinese entrepreneurs should not label themselves as only good at "1 to 100", and then limit us to think about the possibility of "0 to 1" breakthroughs.

"Can you throw away this label completely, and don't use it to limit your pace of innovation and progress?"

Zhang Peng's problem stems from the cognitive gap between China's large-scale model products and ChatGPT.

"To use a not-so-good word, to get its form but not its god. Although the two sides are also consistent at some levels, such as invariably taking multimodality as a key feature of the latest model, not everyone can clearly explain why multimodality is important, and this is the difference. ”

Zhang Peng and Zhipu AI tried to understand the logic behind OpenAI from the source to align the level of GPT-4.

Based on this, on January 16, at the 2024 Zhipu DevDay, Zhang Peng released a new generation of pedestal large model GLM-4 at the Zhipu DevDay.

According to reports, GLM-4 has achieved a significant upgrade in basic capabilities, and its performance has been improved by 60% compared with the previous generation, approaching GPT-4. It supports longer contexts and stronger multimodality, including higher accuracy of Wensheng graph performance and richer semantic image understanding.

At the same time, GLM-4-All Tools can automatically understand and plan complex instructions according to user intent, and freely invoke WebGLM search enhancement, Code Interpreter and multimodal generation capabilities to complete complex tasks. The GLMs personalized agent customization capability is launched, and any user can create their own GLM agent with a simple prompt word command.

There is no doubt that the current Zhipu AI is at the top of the wave. In June last year, in the inventory of Silicon Valley technology media The Information, Zhipu AI was regarded as one of the five companies most likely to become "China's OpenAI".

At the 2023 Zhipu AI all-staff meeting, Zhang Peng sighed, "I am very lucky to be born in this era, and in a lifetime and a few decades, I may be able to catch up with such a wave of technology." ”

Founded in 2019, Zhipu AI is one of the earliest enterprises in China to develop large models, which is transformed from the technical achievements of the Knowledge Engineering Laboratory (KEG) of Tsinghua University. Zhipu AI completed a total of 2.5 billion yuan of financing last year, with investors including Meituan, Ant, Alibaba, Sequoia and Hillhouse.

After a year of catching up with ChatGPT, Zhang Peng believes that the theme of China's large model in 2024 is to stand up to the sky. "We hope to keep up with the most advanced level in the world, and try to top the sky in terms of technology and application, and the thing of standing on the ground is to do a good job in the commercialization of the company. Zhang Peng told "Deep Web".

The following is a transcript of Tencent News's "Deep Web" interview with Zhang Peng, CEO of Zhipu AI, which has been deleted without changing the original meaning:

"Horizontal alignment with GPT 4"

Deep Web: In the past half a year, has the development speed of GPT and domestic large-scale model products met expectations?

Peng Zhang: GPT's development speed is still quite fast, reaching 200 million users in a few months. Whether it's GPT4, the GPT store released in November, or the recently viral GPT5, whether it's true or not, it can be seen that OpenAI is also rapidly updating and iterating, and it really didn't disappoint everyone.

The speed of domestic development is also quite fast. Looking at the domestic model from a foreign perspective, we can clearly feel that foreign countries recognize the development speed of domestic products.

Compared with the articles written by some well-known foreign authorities and teams, from the perspective of the entire chronological evolutionary order, the products of Chinese manufacturers appearing in the articles are increasing, and foreign counterparts are very concerned about these things we are doing.

Deep Web: What is the latest technological breakthrough of Zhipu AI?

Peng Zhang: We brought a new generation of model GLM-4, the brain itself has improved, it was originally a high school student, and now it may have reached the level of a college student, and at the same time, the level of the brain has improved, and we have also made it grow hands, feet, eyes, ears, and has some basic abilities to interact with the real world and the digital world.

Deep Web: At what stage of GPT can the current level of Zhipu AI be benchmarked?

Zhang Peng: We have always aimed at a stage in the AGI path, updating every 3-6 months to do an upgrade, each generation will have some differences, for example, the previous generation we solved the benchmark of the model capability matrix, and then this time the model's ability is basically aligned with GPT4, and the basic ability of GLM-4 has been comparable to GPT-4.

Deep Web: The most important thing to benchmark GPT 4 is to improve your multimodal understanding ability?

Peng Zhang: Multimodal capability is a very important component of this. When we talk about multimodality, the first thing that comes to mind is the Wensheng diagram and the Tusheng text, which is the original driving force to solve the understanding and generation of cross-modality, what is its essence?

At that time, I was very touched when I saw GPT4's report, when everyone was doing text-to-graph generation, GPT4 did a graph-to-text understanding, reasoning and generation, why did it do this?

Personally, I understand that human vision, hearing, touch, etc., are all primitive basic data and information perception capabilities, but language is artificial, using abstract symbols to describe these primitive signal things. Essentially, language is a higher-order signal.

The reason why people are human is because of language, and the birth of this event is a very important node. Two nodes, one is tool creation and use tools, and the other is language, which are the two big signs of the development of artificial intelligence.

It is easier to generate concrete content from abstract data (Wensheng diagram) than to convert from concrete information to abstract information (diagram to text). Why? Because the concrete signal is easier to collect, it contains less dense information, but it is difficult to extract the high-dimensional signal from the low-dimensional signal, and the noise must be eliminated to get the most valuable part.

In essence, from low-level signals to high-level signals, it is more reflective of cognitive ability.

GPT4 enables image understanding and reasoning, which we believe is a very important direction. Throughout 2023, we have spent a lot of energy on multimodality to do the thing of graphic writing. And the release of Gemini also verifies this matter, and Google also thinks that this matter is important, and Google even goes further, it unifies images, videos, sounds, all into a model to learn.

Deep Web: Code augmentation capabilities are also a point of competition between large models?

Peng Zhang: Code augmentation is a more practical problem, training the cognitive ability of a language model, similar to rebuilding a brain, requires stronger thinking ability, comprehension ability, reasoning ability and cognitive ability.

If the brain does not contact and interact with the outside world, it will always be a brain in a vat, and no matter how strong its ability is, it will not be able to interfere with the real world. Code augmentation gives the large model the ability to interact, including the ability to search, so that the large model can grow hands and feet, eyes and ears, obtain information more autonomously, and interact with these systems in the outside world more conveniently. Code enhancements allow large models to generate greater value.

"2024 Keywords: Standing up to the sky"

Deep Web: In 2024, what do you think are the themes and trends of domestic large models?

Zhang Peng: Standing up to the sky. Dingtian can also be expressed by innovation, Dingtian is to break through, technological innovation, application innovation, are all things of Dingtian, to break through.

In 2023, domestic enterprises are in a catch-up situation in technology, Zhipu AI is a little ahead, in 2024, we hope to keep up with the most advanced level in the world, and try to top the sky in technology, GLM-4 is our latest attempt in this regard; Breaking the original nails, everyone can think of this thing, is to use large models to transform the existing workflow, and to find new nails, which means trying some breakthrough innovations.

Regardless of technological breakthroughs or application innovations, it is necessary to return to the company itself to do a solid business in business, transform it into the company's revenue and income, and create customer value.

"Deep Web": Some domestic investors think that OpenAI's technology itself is not so difficult?

Peng Zhang: This kind of expression may not be rigorous, but the more rigorous expression is that in terms of technical principles, OpenAI does not have too many mysteries, and there are many original technologies that were not invented by OpenAI, it carries forward these technologies or achieves them to the extreme. But from the point of view of technical engineering and landing, this is a great thing.

ChatGPT is a very close closed loop from technical principles, engineering, and application to the market, and it is difficult to separate it.

Deep Web: From a technical point of view, can domestic large models catch up with GPT?

Zhang Peng: It's still a catch-up situation, and we've been narrowing the gap between them, after all, the latecomers have the advantage of being latecomers, and we have also omitted some of the previous explorations and focused on the relatively correct path. But to be honest, it's unlikely that you can transcend with something like this alone, because everyone's path is the same, so in the end, you may be able to do the same as tGPT at most.

That's why Zhipu chose its self-developed GLM pre-training framework. We try to use some innovative breakthroughs in the local or the whole chain to improve the speed of our catch-up.

OpenAI started relatively early, and the speed of development is reflected in the slope of the curve, and the domestic large model started later than it, only by using little by little accumulation to adjust the speed of development and adjust the slope of the curve, it is possible to expect that it will get closer and closer, and there will be a point of crossing.

Therefore, in the chain of algorithms, systems engineering, data, application to landing, etc., all innovations can be added up to surpass it.

"Deep Web": In the future, each giant will have its own large model, will the products converge, and where is the differentiation?

Zhang Peng: Our company's ability to do the overall situation, these capabilities are an indispensable part of the overall goal of AGI, some will be biased towards the application to do, some will be biased towards the industry to do, and there will be some differences slowly.

"From 0 to 1" label

Deep Web: Does OpenAI's previous personnel turmoil have a big impact on GPT's technological evolution?

Peng Zhang: At the moment, it doesn't seem to have much impact.

Deep Web: In your opinion, what is the main dimension of the gap between the current domestic large model and Silicon Valley?

Zhang Peng: The gaps can be enumerated from various aspects, and I think the essence is everyone's perception of this matter. Represented by the world's top teams such as OpenAI and Google, their awareness of large models must be very high.

Deep Web: Why is there such a difference?

Zhang Peng: Last year, I participated in some forums and roundtables, and everyone discussed that Chinese are not very good at 0 to 1, but they are very good at 1 to 100. I'm thinking about why? Let's summarize some things in the past, taking the mobile Internet and the Internet as examples, China is not the origin of technology, but from the perspective of application, Chinese companies are very strong, surpassing American companies.

Of course, these things in the past are not enough to put a label and limit ourselves to thinking 0 to 1 things, I have been thinking that we should be able to throw away this label completely, and not use it to limit the pace of innovation and progress.

Deep Web: What do you think is the essence of large models?

Peng Zhang: I think the large model is a technical means for us to try to understand or simulate the cognitive ability close to the human brain in the process of exploring AGI, which is based on the behavioral method of artificial intelligence.

Deep Web: How do you view the competitive landscape in the second half of AGI?

Peng Zhang: Strictly speaking, the second half of the game is not called AGI, but the second half of generative AI. There are a few more to come, I don't know.

AGI is also not the same as generative AI. Large models may be a very effective technical means in our pursuit of AGI, but they may not represent AGI. AGI is hard, and there are still many problems that need to be solved.

"Deep Web": What kind of node is it for the domestic general model?

Zhang Peng: I think after full competition in 2023, now we can gradually say that we have entered a decisive moment.

"Closed source makes it easier for companies to reap revenues"

Deep Web: Open source and closed source, at present, will lead to two different technology and industrial paths?

Peng Zhang: Open source and closed source are indeed two different things. What is the relationship between open source, closed source, and commercial use?

In my opinion, open source is an indispensable part of the entire industry ecosystem, and it is the vitality and driving force of technological diversity and innovation. Open source will have some permissions and will be used for commercialization, but the real large-scale commercial use will eventually fall into closed source. In terms of business value, especially for medium and large customers, the choice is not only the cost of the technology itself, but also a series of issues such as the stability, support, consistency, service and security of the technology.

From the perspective of commercial applications, the closed-source version may allow enterprises to obtain better benefits and maintain better commercial services, so the purpose of open source and closed-source is different, and the essence is also different.

"Deep Web": Can you share the progress of the commercialization of Zhipu?

Peng Zhang: From the perspective of the entire commercialization path, we will have some choices of our own, which are derived from our team's genes and our comprehensive judgment of the current market. We have long determined that we want to do TO B, and we will focus on this part of commercialization, and we will also do it, but our purpose is also relatively clear, to make the closed loop, leaving a possibility for the future, and the application of TO C may also be a point to explode in the future.

TO B is like an open platform, which is actually a concrete result of our implementation of TO B services.

"Deep Web": Artificial intelligence has not broken out before, is there a lack of an epoch-making product?

Peng Zhang: The explosion of artificial intelligence is not something that can be decided by a product. Just like the technology of the previous generation of AI, can it be said that it is not explosive enough? Every day, face brushing, payment, and voice have all reached this level, whether it is considered an explosion, but why do you think that it is not an explosion? You may subconsciously feel that it is not the artificial intelligence we imagined, but more like a tool.

"Deep Web": Domestic mobile phones are also developing end-side models, Intel and Lenovo have begun to make efforts on AIPC, will the AI revolution in these hardware fields bring about the reshaping of related industries?

Zhang Peng: I think there is a high probability that there will be, everyone wants to use new things on their mobile phones, and there is a demand. The first is how to get the path of this technology through, the second is to reduce the cost, and the third is to improve the experience of use, to a certain extent, these three things are to be carried out simultaneously.

Zhang Peng of Zhipu AI: Chinese entrepreneurs should also do 0 to 1 innovation, and the basic capabilities of large models are comparable to GPT-4

Zhang Peng of Zhipu AI: Chinese entrepreneurs should also do 0 to 1 innovation, and the basic capabilities of large models are comparable to GPT-4

Read on