Beat Gemini-1.5-Pro and GPT-4V, and the multi-modal capability of the large model ranks among the top three in the world

author：Heart of the Machine Pro 2024-06-28 18:36:00

Reported by the Heart of the Machine

Heart of the Machine Editorial Department

Recently, Cloudwalk has made significant progress in the field of multimodal evaluation of OpenCompass, an authoritative comprehensive evaluation platform.

According to the latest evaluation results, the average score of Cloudwalk Technology's large model in the system is 65.5, which makes the large model among the top three in the world, surpassing Google's Gemini-1.5-Pro and GPT-4v, and second only to GPT-4o (69.9) and Claude3.5-Sonnet (67.9).

In the domestic market, the performance of the large model also surpassed that of InternVL-Chat (61.7) and GLM-4V (60.8), ranking first.

Beat Gemini-1.5-Pro and GPT-4V, and the multi-modal capability of the large model ranks among the top three in the world

图 1：OpenCompass 多模态榜单

OpenCompass is a complete, open-source, reproducible evaluation framework launched by Shanghai Artificial Intelligence Lab. In terms of OpenCompass multimodal evaluation, 8 representative datasets are used to objectively quantify the capabilities of multimodal large models from multiple perspectives, covering object detection, text recognition, action recognition, image understanding and relational reasoning, art and design, business, science, health and medicine, humanities and social sciences, technology and engineering, mathematical reasoning and other aspects.

Figure 2: Calm Large Model - 2.0 Multimodal Capability Example

In this evaluation, the large model performed well in 6 of the datasets, ranking first in China (MMbench, MMStar, MathVista, HallusionBench, AI2D, OCRBench), especially in the OCRBench test set, which achieved the world's highest score of 827 (total score of 1000 points), and 13 points higher than the second place GLM-4v, further improving the performance of the large model in text recognition, Applicability in business scenarios such as text-centric visual Q&A, document-oriented visual Q&A, and key information extraction.

Figure 3: OpenCompass China's large-scale model capability display

The excellent performance of the large model in this system relies on the efficient multi-modal processing architecture and advanced computing technology developed by Cloudwalk Technology to achieve efficient multi-modal data processing capabilities, which can achieve efficient fusion and switching between vision and language tasks, and maximize the use of computing resources to ensure that it can still maintain high performance and response speed when processing large-scale multi-modal data, making the training process of the model more efficient, faster convergence, and more stable performance.

At the same time, it also benefits from the deep accumulation and continuous innovation of Cloudwalk Technology in the field of vision and language for a long time.

Figure 4: Calm Large Model - 2.0 Multimodal Capability Example

Previously, the large model has refreshed the world record 10 times in the field of vision and cross-modality, and its comprehensive performance has been comprehensively evaluated by third-party SuperClue and C-Eval, ranking among the top five in the world.

As a platform company focusing on the research and development of human-machine collaboration technology, Cloudwalk has been actively promoting the development and application of AI agents and large model technologies.

With the rapid development of artificial intelligence technology, multi-modal large models have become the core engine driving industrial transformation. The outstanding performance of the large model in the OpenCompass large model open evaluation system is not only a recognition of the technological innovation strength of Cloudwalk, but also a model in the industry, encouraging global technology companies to scale new heights in the new round of artificial intelligence competition.

Beat Gemini-1.5-Pro and GPT-4V, and the multi-modal capability of the large model ranks among the top three in the world

Read on

The overall benchmark of GPT-4 Turbo iFLYTEK Liu Qingfeng said that the comprehensive gap of large models should be rationally understood

Weekend Securities | Artificial intelligence is weak in the market, and domestic large models seize the market

From one to infinity: A simulation of a sample of respondents by a large language model

Long article combing! The development history of the GPT series of models in recent years: from GPT-1 to GPT-4o

A comprehensive guide to fine-tuning language models (LLMs): mimicking a researcher's writing style

AI breakthrough again! A new type of neuronal network model has been introduced: stronger environmental perception and better human brain imitation

2024 Security Large Model Technology and Market Research Report

#头条创作挑战赛#OpenAI将从7月9日开始, API access traffic from unsupported countries and regions is blocked. One of the domestic replacements: AI large model unicorn

EZVIZ released the "Blue Ocean Model" to open a new era of intelligence in IoT scenarios

OpenAI discontinued", can the domestic large model take over the baton?

Domestic large models broke out! The number of users exceeded 300 million, and the number of calls in a single day soared to 500 million

I can't laugh anymore! Teach you to use the model machine to drain the stall! This boss is really an entrepreneurial genius!

Zhou Dynasty "Universe Model": How to calculate the height of the day is 80,000 li and the diameter of the universe is 810,000 li?

The survival experience of a Chinese start-up model company Doctor AI: Sink down, live, don't stand in the cracks

Feasibility verification of AI-large models

[Weekly Look] iFLYTEK large model entered the school; Onion School disclosed the "intelligent learning partner" for the first time