laitimes

Beat Gemini-1.5-Pro and GPT-4V, and the multi-modal capability of the large model ranks among the top three in the world

author:Heart of the Machine Pro

Reported by the Heart of the Machine

Heart of the Machine Editorial Department

Recently, Cloudwalk has made significant progress in the field of multimodal evaluation of OpenCompass, an authoritative comprehensive evaluation platform.

According to the latest evaluation results, the average score of Cloudwalk Technology's large model in the system is 65.5, which makes the large model among the top three in the world, surpassing Google's Gemini-1.5-Pro and GPT-4v, and second only to GPT-4o (69.9) and Claude3.5-Sonnet (67.9).

In the domestic market, the performance of the large model also surpassed that of InternVL-Chat (61.7) and GLM-4V (60.8), ranking first.

Beat Gemini-1.5-Pro and GPT-4V, and the multi-modal capability of the large model ranks among the top three in the world

图 1:OpenCompass 多模态榜单

OpenCompass is a complete, open-source, reproducible evaluation framework launched by Shanghai Artificial Intelligence Lab. In terms of OpenCompass multimodal evaluation, 8 representative datasets are used to objectively quantify the capabilities of multimodal large models from multiple perspectives, covering object detection, text recognition, action recognition, image understanding and relational reasoning, art and design, business, science, health and medicine, humanities and social sciences, technology and engineering, mathematical reasoning and other aspects.

Beat Gemini-1.5-Pro and GPT-4V, and the multi-modal capability of the large model ranks among the top three in the world

Figure 2: Calm Large Model - 2.0 Multimodal Capability Example

In this evaluation, the large model performed well in 6 of the datasets, ranking first in China (MMbench, MMStar, MathVista, HallusionBench, AI2D, OCRBench), especially in the OCRBench test set, which achieved the world's highest score of 827 (total score of 1000 points), and 13 points higher than the second place GLM-4v, further improving the performance of the large model in text recognition, Applicability in business scenarios such as text-centric visual Q&A, document-oriented visual Q&A, and key information extraction.

Beat Gemini-1.5-Pro and GPT-4V, and the multi-modal capability of the large model ranks among the top three in the world

Figure 3: OpenCompass China's large-scale model capability display

The excellent performance of the large model in this system relies on the efficient multi-modal processing architecture and advanced computing technology developed by Cloudwalk Technology to achieve efficient multi-modal data processing capabilities, which can achieve efficient fusion and switching between vision and language tasks, and maximize the use of computing resources to ensure that it can still maintain high performance and response speed when processing large-scale multi-modal data, making the training process of the model more efficient, faster convergence, and more stable performance.

At the same time, it also benefits from the deep accumulation and continuous innovation of Cloudwalk Technology in the field of vision and language for a long time.

Beat Gemini-1.5-Pro and GPT-4V, and the multi-modal capability of the large model ranks among the top three in the world

Figure 4: Calm Large Model - 2.0 Multimodal Capability Example

Previously, the large model has refreshed the world record 10 times in the field of vision and cross-modality, and its comprehensive performance has been comprehensively evaluated by third-party SuperClue and C-Eval, ranking among the top five in the world.

As a platform company focusing on the research and development of human-machine collaboration technology, Cloudwalk has been actively promoting the development and application of AI agents and large model technologies.

With the rapid development of artificial intelligence technology, multi-modal large models have become the core engine driving industrial transformation. The outstanding performance of the large model in the OpenCompass large model open evaluation system is not only a recognition of the technological innovation strength of Cloudwalk, but also a model in the industry, encouraging global technology companies to scale new heights in the new round of artificial intelligence competition.

Read on