OpenAI terminated its API services for China, and Zhou Hongyi spoke: It can't suppress the development of domestic large models

2024-06-26 16:01:41

OpenAI terminated its API services for China, and Zhou Hongyi spoke: It can't suppress the development of domestic large models

National Business Daily

2024-06-26 13:22Posted on the official account of Sichuan Daily Economic News

Edited by: Bi Luming

On the 26th, @周洪秎, founder and chairman of 360 Group, posted a video on Weibo saying that he believes that "OpenAI's suspension of services in China can only accelerate the development of China's own large model industry, which is not necessarily a bad thing." He explained: "OpenAI's API cannot be called, which forces domestic applications to choose only domestic large models, and the gap between domestic large models and GPT has gradually narrowed." ”

OpenAI terminated its API services for China, and Zhou Hongyi spoke: It can't suppress the development of domestic large models

On the news side, on June 25, OpenAI issued an email to Chinese users saying that it would block API traffic from unsupported countries and regions from July 9. Affected organizations must access OpenAI's services in a supported country or region if they wish to continue using it. Currently, OpenAI's API is open to 161 countries and regions, and since China is not among them, this means that OpenAI will terminate its API services to China.

It is worth mentioning that on the 25th, the Daily Economic News Large Model Evaluation Report (Phase 1) was released. According to the "Daily Economic News Large Model Evaluation Report" (Phase I), the domestic large model is catching up with and surpassing the overseas large model in an all-round way, and Yi-Large has become the biggest "dark horse", ranking first in the total score of the four major application scenarios of "financial news headline creation", "Weibo news writing", "article error proofreading" and "financial data calculation and analysis". High-Flyer DeepSeek-V2 and Baichuan4 showed powerful data calculation and analysis capabilities in the "Financial Data Calculation and Analysis" scenario. GPT 4.0, which has always been highly regarded by all walks of life, did not perform well in this evaluation, and even ranked at the bottom of the "financial news headline creation" scenario.

OpenAI "suspends" China's API!

According to media reports, on the 25th, OpenAI announced the termination of API services for regions including Chinese mainland. In the early morning of the 25th, some developers received an email from OpenAI's official website.

The email reads: "Our data shows that your organization's APl traffic is coming from regions that are not currently supported by OpenAl. You can find supported countries and regions here. Starting July 9th, we will be taking additional steps to block APl traffic from regions that are not on our list of supported countries and regions. To continue using OpenAl's services, you will need to access the services in a supported region. ”

It is reported that OpenAI's API is currently open to 161 countries and regions, but Chinese mainland is not included. This also means that OpenAI announced the termination of API services to Chinese mainland.

Alibaba Cloud Bailian announced for the first time that it will provide OpenAI API users with the most cost-effective alternative to China's large models, and provide Chinese developers with 22 million free tokens and exclusive migration services. According to Stanford's latest large-scale model evaluation list HELM MMLU, Qwen2-72B scored 0.824, tied with GPT-4 for fourth place in the world. The call price of Qwen-plus, the main model of Tongyi Qianwen GPT4, on Alibaba Cloud Bailian is 0.004 yuan/1000 tokens, which is only one-50th of GPT-4.

On June 25, Zhipu launched a special moving plan for OpenAI API users to help users switch to domestic large models. Specifically, Zhipu provides developers with: 150 million Tokens (50 million GLM-4 + 100 million GLM-4-Air); A series of migration training from OpenAI to GLM. For high-usage customers, Zhipu provides a token gift program (no upper limit) that is equal to OpenAI's usage scale, as well as a concurrency scale equivalent to OpenAI.

On June 25, Baidu Intelligent Cloud Qianfan launched a large-scale inclusive plan, providing newly registered enterprise users with services such as 0 yuan call, 0 yuan training, and 0 yuan migration from now on.

Among them, the Wenxin flagship model is free for the first time, and the ERNIE3.5 flagship model is free of 50 million tokens, and the main model ERNIE Speed/ERNIE Lite and the lightweight model ERNIE Tiny continue to be free; For OpenAI migration users, an additional ERNIE3.5 flagship model Tokens package with the same scale as OpenAI will be given. The above promotions are valid until 24 o'clock on 25 July 2024.

Domestic large models are catching up in an all-round way

Recently, the "Daily Economic News Large Model Evaluation Team", which was formed by more than 30 outstanding reporters, editors and subsidiaries of the Daily Economic News, conducted an in-depth evaluation of the performance and ability of the mainstream large model in the financial news work scene for two months, and released the "Daily Economic News Large Model Evaluation Report" (Phase I) on June 25.

According to the "Daily Economic News Large Model Evaluation Report" (Phase I), the domestic large model is catching up with and surpassing the overseas large model in an all-round way, and Yi-Large has become the biggest "dark horse", ranking first in the total score of the four major application scenarios of "financial news headline creation", "Weibo news writing", "article error proofreading" and "financial data calculation and analysis". High-Flyer DeepSeek-V2 and Baichuan4 showed powerful data calculation and analysis capabilities in the "Financial Data Calculation and Analysis" scenario. GPT 4.0, which has always been highly regarded by all walks of life, did not perform well in this evaluation, and even ranked at the bottom of the "financial news headline creation" scenario.

After evaluation, the "Daily Economic News Large Model Evaluation Report" (Phase I) came to the following conclusions.

Conclusion 1: The domestic large model is catching up in an all-round way

Domestic large models are gradually showing their competitiveness. Compared with foreign large models, their performance in multiple tasks has shown a tendency to catch up.

The domestic large model ranked high in several test scenarios. SenseTime's SenseChat-5 occupies the top five spots three times, beating Google's Gemini 1.5 Pro twice. Among foreign models, the Anthropic Claude 3 Opus also ranked in the top five in the three evaluation scenarios, and the Google Gemini 1.5 Pro ranked first in the two scenarios of "financial news headline creation" and "article error proofreading". Surprisingly, GPT 4.0, which has always been highly regarded by all walks of life, did not perform well in this evaluation overall, failing to score in the top five in every scenario, and even ranked at the bottom of the "financial news headline creation".

In the "Financial News Headline Creation" scenario, SenseChat 5, Byte Doubao-pro-32k, and Baidu ERNIE 4.0 are on a par with Google's Gemini 1.5 Pro in terms of accurate information extraction and prominent important news points.

"微博新闻写作"场景中,百度文心ERNIE 4.0、商汤SenseChat-5等模型的总分与国外模型Anthropic Claude 3 Opus并列第一。

In the "article error proofreading" scenario, Yi-Large is the only domestic large model with a score of more than 100 points. The domestic model can understand Chinese sentence patterns and expression norms better than foreign models. But there is room for improvement in finding and correcting more precise tasks such as typos, misuse of punctuation, errors in numbers and quantifiers, and errors in facts and information.

In the "Financial Data Calculation and Analysis" scenario, although Anthropic Claude 3 Opus leads the overall score, it does not have much advantage over DeepSeek-V2 and Yi-Large of Magic Quadrant. In particular, High-Flyer DeepSeek-V2 has become a "dark horse" in this scenario evaluation, and its "financial data analysis" ability is outstanding.

Conclusion 2: Large models have their own expertise

The performance of different models in specific scenarios, specific dimensions, and specific indicators varies significantly. reflects their expertise in their respective fields.

For example, the Google Gemini 1.5 Pro ranked first in the two major scenarios of "financial news headline creation" and "article error proofreading". In the "Weibo News Writing" scenario, the model ranks low overall.

Anthropic Claude 3 Opus、幻方求索DeepSeek-V2、百川智能Baichuan4则显示出了强大的数据计算能力。

Conclusion 3: There are significant differences in cross-linguistic environments

Taking the "Weibo news writing" scenario as an example, Baidu Wenxin ERNIE 4.0, SenseTime SenseChat-5 and Anthropic Claude 3 Opus tied for first place. This reflects the outstanding performance of the domestic model in the domestic social media scene of Weibo. The domestic large model can accurately grasp the content preferences and communication methods of Weibo users, and generate Weibo copywriting that meets the characteristics of the platform and user expectations.

In contrast, the Google Gemini 1.5 Pro scored 0 in the operational dimension of Weibo writing, possibly due to its unfamiliarity with the characteristics of the Weibo platform and user behavior.

In the Chinese context, GPT 4.0 did not rank well in all four scenarios. This phenomenon highlights the adaptability of large models in cross-linguistic and cultural environments, and also shows that domestic large models have natural advantages in localized applications.

Conclusion 4: The ability to extract information is uneven

Accurately extracting key information from articles is a key challenge to the capabilities of large models. The "Proofreading of Texts" scenario in this review includes a test of this ability.

The Google Gemini 1.5 Pro differentiates itself from other large models with its ability to find and correct errors in typos, misuse of punctuation, errors in numbers and quantifiers, and errors in facts and information.

In contrast, the Yi-Large was in first place in terms of sentence finding and error correction, and could have challenged the Google Gemini 1.5 Pro, but its performance in error finding was lagging behind.

The difference in the information extraction ability of large models may be related to the training data of the model, the algorithm design, and the ability to capture the nuances of the language. Enhancing the information extraction ability of the large model can improve the accuracy of the results it generates, and make the large model suitable for news work that requires high accuracy.

Daily Economic News Comprehensive @ Zhou Hongyi

National Business Daily

View original image 181K