Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

Author | ZeR0

Edit | Desert Shadow

Zhidong reported on June 28 that last night, Google made a big move at the I/O Connect conference and announced its new generation of the strongest open-source model - Gemma 2.

Gemma 2 is available in 9 billion (9B) and 27 billion (27B) parameter sizes. The 27B model was trained with 13T tokens, and 9B was 8T tokens, both with 8192 context windows, which can be used in Google AI Studio. The 2.6 billion parameter (2.6B) model will be released soon, small enough to run locally on mobile phones.

In the LMSYS Chatbot Arena, a blind test large language model arena, the 27 billion parameter Gemma 2 instruction fine-tuning model defeated Llama 3 with 70 billion parameters, and surpassed Nemotron 4 340B, Claude 3 Sonnet, Command R+, Qwen 72B and other models, ranking first among all open-source weight models. The 9B model is the best among the current models with 15B parameters and below.

Earlier this year, Google launched Gemma, a lightweight advanced open-source model with only 2B and 7B parameter versions, and has been downloaded more than 10 million times. Covering from 2 billion to 27 billion parameters, Gemma 2 delivers higher performance, higher inference efficiency, and significantly improved security than the first generation. This is a big step forward for the models of this series.

The 27 billion-parameter Gemma 2 offers an alternative to competing with models with more than twice the parameters, delivering performance that wasn't possible until last December, and the ability to efficiently run inference at full precision on a single NVIDIA A100/H100 Tensor Core GPU or TPU host, dramatically reducing deployment costs.

On Hugging Face's benchmark, Google compares the Gemma 2 27B to the similarly sized Qwen 1.5 32B and also reports the performance of the Llama 3 70B. The size of the Gemma 2 27B is only 40% of that of the Llama 3 70B, and the training data is less than 2/3 of the Llama 3 70B. The results show that Gemma 2 27B is better than Qwen 1.5 32B and a few percentage points lower than Llama 3 70B.

1. Redesign the architecture, Gemma 2 has three major features

Gemma 2's 15-page technical report describes several technical improvements to its architecture, including the alternating use of a local-global attention mechanism and grouped query attention, as well as the use of knowledge distillation instead of next token prediction to help train smaller 2B and 9B models.

▲The number of parameters of the Gemma model

The 2.6B model was trained on a 2x16x16 configuration in a TPUv5e cluster, using a total of 512 chips. The 9B model was trained on an 8x16x32 configuration in a TPUv4 cluster with a total of 4096 chips. The 27B model was trained on an 8x24x32 configuration in a TPUv5p cluster, using a total of 6144 chips.

▲ Training infrastructure with slicing

For higher performance and inference efficiency, Google built Gemma 2 on a redesigned architecture. The model uses a similar algorithm recipe to Gemma 1.1, but with more teacher supervision and model merging. In terms of programming, math, reasoning, security and other abilities, Gemma 2 is significantly improved compared to version 1.1.

▲An overview of the main model parameters and design options

As a result, the Gemma 2 model offers the best performance in its scale and even offers an alternative to competing with models 2-3 times larger. Here are its standout features:

(1) Superior performance: The Gemma 2 27B offers the best performance in its class, even offering an alternative to competing with models that are more than twice the size. The Gemma 2 9B model also offers leading performance, surpassing the Llama 3 8B and other open-source models of similar size.

Google compares the 2.6B, 9B, and 27B models on various benchmarks, reporting the average performance of the 8 benchmarks that can be compared to Llama 3, as well as the average performance of all benchmarks. The data for the Llama 3 8B comes from the HuggingFace leaderboard or its blog.

On MMLU, the 9B model scored 71.3 and 27B scored 75.2, on AGIEval, the 9B model scored 52.8 and the 27B model scored 55.1, and on HumanEval, the 9B model scored 40.2 and the 27B model scored 51.8.

(2) Unparalleled efficiency and cost savings: The Gemma 2 27B model is designed to efficiently run full-precision inference on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or H100 Tensor Core GPU, significantly reducing costs while maintaining high performance. This makes AI deployments more accessible and affordable.

(3) Fast reasoning across hardware: Gemma 2 is optimized to run at incredible speeds on a wide range of hardware, from powerful gaming laptops and high-end desktops to cloud-based setups. Try Gemma 2 in full precision in Google AI Studio, unlock local performance with a quantized version of Gemma.cpp on your CPU, or try it out on NVIDIA RTX or GeForce RTX with Hugging Face Transformers on your home computer.

2. Support commercialization, be compatible with a wide range of frameworks, and facilitate deployment

Built for developers and researchers, Gemma 2 is designed to be easier to integrate into workflows:

(1) Open and accessible: Like the original Gemma model, Gemma 2 is released under Google's commercially friendly Gemma license, allowing developers and researchers to share and commercialize their innovations.

(2) Broad framework compatibility: Gemma 2 is compatible with major AI frameworks such as Hugging Face Transformers, as well as JAX, PyTorch, and TensorFlow via native Keras 3.0, vLLM, Gemma.cpp, Llama.cpp, and Ollama. In addition, Gemma optimizes NVIDIA TensorRT-LLM to run on NVIDIA acceleration infrastructure or as an NVIDIA NIM inference microservice. Users can fine-tune with Keras and Hugging Face. Google is actively working to implement more parameter-efficient fine-tuning options.

(3) Easy deployment: Starting next month, Google Cloud customers will be able to easily deploy and manage Gemma 2 on Vertex AI.

The new Gemma Cookbook is a collection of practical examples and guides that guide users through building their own applications and fine-tuning the Gemma 2 model for specific tasks.

3. Provide responsible AI development resources and rigorously test and evaluate model security

When it comes to responsible AI development, Google provides the resources needed to build and deploy AI responsibly, including a responsible generative AI toolkit. The recently open-sourced LLM Comparator helps developers and researchers evaluate language models in depth.

From now on, users can use the companion Python library to compare and evaluate their own models and data, and visualize the results in their applications. In addition, Google is actively working on an open-source text watermarking technology, SynthID, for use in the Gemma model.

When training Gemma 2, Google followed internal security processes, filtered pre-training data, and rigorously tested and evaluated against a comprehensive set of metrics to identify and mitigate potential biases and risks. Google publishes its results on a large number of public benchmarks related to safety and representativeness hazards.

▲ Security academic benchmark results of the Gemma 2 IT model and the Gemma 1.1 IT model

Conclusion: The research and development of large models tends to be pragmatic

Google's Gemma 2 research progress reflects the current trend of large model research, which explores the use of lighter, more practical models to achieve stronger performance and ensure easy deployment to better meet different user needs.

Google offers multiple ways for developers and researchers to use these models. Gemma 2 is now available in Google AI Studio to test its full performance with 27 billion parameters without hardware requirements, as well as to download model weights for Gemma 2 from Kaggle and Hugging Face Models, with Vertex AI Model Garden coming soon.

With Gemma 2, Google has demonstrated that distillation is an effective way to train such models, and that training based on output probability is able to produce more effects than pure next token predictions. The model still has limitations, and future research is needed to continuously optimize the factual and adversarial attack robustness, as well as inference and consistency.

To support research and development, Gemma 2 is also available for free through Kaggle, or through the free tier of Colab notebooks. First-time users of Google Cloud services may be eligible for a $300 credit. Academic researchers can apply for the Gemma 2 Academic Research Program to earn Google Cloud credits to accelerate research on Gemma 2. The application deadline is Aug. 9.

Source: Google DeepMind

Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

1. Redesign the architecture, Gemma 2 has three major features

2. Support commercialization, be compatible with a wide range of frameworks, and facilitate deployment

3. Provide responsible AI development resources and rigorously test and evaluate model security

Conclusion: The research and development of large models tends to be pragmatic

Read on

The overall benchmark of GPT-4 Turbo iFLYTEK Liu Qingfeng said that the comprehensive gap of large models should be rationally understood

Weekend Securities | Artificial intelligence is weak in the market, and domestic large models seize the market

From one to infinity: A simulation of a sample of respondents by a large language model

Long article combing! The development history of the GPT series of models in recent years: from GPT-1 to GPT-4o

A comprehensive guide to fine-tuning language models (LLMs): mimicking a researcher's writing style

AI breakthrough again! A new type of neuronal network model has been introduced: stronger environmental perception and better human brain imitation

2024 Security Large Model Technology and Market Research Report

#头条创作挑战赛#OpenAI将从7月9日开始, API access traffic from unsupported countries and regions is blocked. One of the domestic replacements: AI large model unicorn

EZVIZ released the "Blue Ocean Model" to open a new era of intelligence in IoT scenarios

OpenAI discontinued", can the domestic large model take over the baton?

Domestic large models broke out! The number of users exceeded 300 million, and the number of calls in a single day soared to 500 million

I can't laugh anymore! Teach you to use the model machine to drain the stall! This boss is really an entrepreneurial genius!

Zhou Dynasty "Universe Model": How to calculate the height of the day is 80,000 li and the diameter of the universe is 810,000 li?

The survival experience of a Chinese start-up model company Doctor AI: Sink down, live, don't stand in the cracks

Feasibility verification of AI-large models

[Weekly Look] iFLYTEK large model entered the school; Onion School disclosed the "intelligent learning partner" for the first time