On September 19, at the Apsara Conference, Zhou Jingren, CTO of Alibaba Cloud, released the new generation of open source model Qwen2.5 of Tongyi Qianwen, and the flagship model Qwen2.5-72B surpassed Llama 405B in performance, and once again ascended the throne of the global open source large model. The Qwen 2.5 series covers large language models, multimodal models, mathematical models and code models of multiple sizes, and each size has a basic version, an instruction-following version, and a quantized version, with a total of more than 100 models on the shelves, setting a new industry record.
The whole series of Qwen2.5 models are pre-trained on 18T tokens data, and the overall performance is improved by more than 18% compared with Qwen2, with more knowledge, stronger programming and mathematical skills. The Qwen2.5-72B model scored 86.8, 88.2, and 83.1 on the MMLU-rudex benchmark (for general knowledge), MBPP (for code ability), and MATH (for math ability).
Qwen 2.5 supports contextual lengths of up to 128K and can generate up to 8K content. The model has strong multilingual capabilities, supporting more than 29 languages such as Chinese, English, French, Spain, Russian, Japanese, Viet Nam, Arabic, etc. The model is able to respond silkily to a variety of system prompts, enabling tasks such as role-playing and chatbots. Qwen 2.5 has made significant progress in following instructions, understanding structured data (e.g., tables), and generating structured output (especially JSON).
In terms of language models, Qwen2.5 has been open-sourced in 7 sizes, 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B, which have created the best results in the industry in the same parameter track. The 32B is the most anticipated "king of price/performance" by developers, with the best balance between performance and power consumption, and the overall performance of the Qwen2.5-32B surpasses that of the Qwen2-72B.
在MMLU-redux等十多个基准测评中,Qwen2.5-72B表现超越Llama3.1-405B
72B是Qwen2.5系列的旗舰模型,其指令跟随版本Qwen2.5-72B-Instruct在MMLU-redux、MATH、MBPP、LiveCodeBench、Arena-Hard、AlignBench、MT-Bench、MultiPL-E等权威测评中表现出色,在多个核心任务上,以不到1/5的参数超越了拥有4050亿巨量参数的Llama3.1-405B,继续稳居"全球最强开源大模型"的位置。
In terms of specialized models, Qwen2.5-Coder for programming and Qwen2.5-Math for mathematics are both substantial improvements over their predecessors. Qwen2.5-Coder is trained on programming-related data of up to 5.5T tokens, and is open-sourced versions 1.5B and 7B on the same day, and will be open-sourced in the future 32B version. 72B in three sizes and a mathematical reward model, Qwen2.5-Math-RM.
In terms of multimodal models, the widely anticipated visual language model Qwen2-VL-72B is officially open-sourced, which can recognize images with different resolutions and aspect ratios, understand long videos of more than 20 minutes, and have the ability to autonomously operate mobile phones and robots. Recently, the authoritative evaluation LMSYS Chatbot Arena Leaderboard released the latest visual model performance evaluation results, and Qwen2-VL-72B became the world's highest-scoring open source model.
Qwen2-VL-72B在权威测评LMSYS Chatbot Arena Leaderboard成为成为全球得分最高的开源视觉理解模型
Since its open source in August 2023, Tongyi has lagged behind in the field of global open source large models and has become the preferred model for developers, especially Chinese developers. In terms of performance, the Tongyi large model has gradually caught up with Llama, the strongest open source model in United States, and has topped the Hugging Face global large model list for many times; As of mid-September 2024, the number of downloads of Tongyi Qianwen open source models has exceeded 40 million, and the total number of Qwen series derivative models has exceeded 50,000, making it a world-class model group second only to Llama.
According to HuggingFace data, as of mid-September, the total number of original and derived models of the Qwen series exceeded 50,000