laitimes

Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

author:Smart stuff
Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

Author | ZeR0

Edit | Desert Shadow

Zhidong reported on June 28 that last night, Google made a big move at the I/O Connect conference and announced its new generation of the strongest open-source model - Gemma 2.

Gemma 2 is available in 9 billion (9B) and 27 billion (27B) parameter sizes. The 27B model was trained with 13T tokens, and 9B was 8T tokens, both with 8192 context windows, which can be used in Google AI Studio. The 2.6 billion parameter (2.6B) model will be released soon, small enough to run locally on mobile phones.

In the LMSYS Chatbot Arena, a blind test large language model arena, the 27 billion parameter Gemma 2 instruction fine-tuning model defeated Llama 3 with 70 billion parameters, and surpassed Nemotron 4 340B, Claude 3 Sonnet, Command R+, Qwen 72B and other models, ranking first among all open-source weight models. The 9B model is the best among the current models with 15B parameters and below.

Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

Earlier this year, Google launched Gemma, a lightweight advanced open-source model with only 2B and 7B parameter versions, and has been downloaded more than 10 million times. Covering from 2 billion to 27 billion parameters, Gemma 2 delivers higher performance, higher inference efficiency, and significantly improved security than the first generation. This is a big step forward for the models of this series.

The 27 billion-parameter Gemma 2 offers an alternative to competing with models with more than twice the parameters, delivering performance that wasn't possible until last December, and the ability to efficiently run inference at full precision on a single NVIDIA A100/H100 Tensor Core GPU or TPU host, dramatically reducing deployment costs.

Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

On Hugging Face's benchmark, Google compares the Gemma 2 27B to the similarly sized Qwen 1.5 32B and also reports the performance of the Llama 3 70B. The size of the Gemma 2 27B is only 40% of that of the Llama 3 70B, and the training data is less than 2/3 of the Llama 3 70B. The results show that Gemma 2 27B is better than Qwen 1.5 32B and a few percentage points lower than Llama 3 70B.

Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

1. Redesign the architecture, Gemma 2 has three major features

Gemma 2's 15-page technical report describes several technical improvements to its architecture, including the alternating use of a local-global attention mechanism and grouped query attention, as well as the use of knowledge distillation instead of next token prediction to help train smaller 2B and 9B models.

Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

▲The number of parameters of the Gemma model

The 2.6B model was trained on a 2x16x16 configuration in a TPUv5e cluster, using a total of 512 chips. The 9B model was trained on an 8x16x32 configuration in a TPUv4 cluster with a total of 4096 chips. The 27B model was trained on an 8x24x32 configuration in a TPUv5p cluster, using a total of 6144 chips.

Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

▲ Training infrastructure with slicing

For higher performance and inference efficiency, Google built Gemma 2 on a redesigned architecture. The model uses a similar algorithm recipe to Gemma 1.1, but with more teacher supervision and model merging. In terms of programming, math, reasoning, security and other abilities, Gemma 2 is significantly improved compared to version 1.1.

Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

▲An overview of the main model parameters and design options

As a result, the Gemma 2 model offers the best performance in its scale and even offers an alternative to competing with models 2-3 times larger. Here are its standout features:

(1) Superior performance: The Gemma 2 27B offers the best performance in its class, even offering an alternative to competing with models that are more than twice the size. The Gemma 2 9B model also offers leading performance, surpassing the Llama 3 8B and other open-source models of similar size.

Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

Google compares the 2.6B, 9B, and 27B models on various benchmarks, reporting the average performance of the 8 benchmarks that can be compared to Llama 3, as well as the average performance of all benchmarks. The data for the Llama 3 8B comes from the HuggingFace leaderboard or its blog.

Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

On MMLU, the 9B model scored 71.3 and 27B scored 75.2, on AGIEval, the 9B model scored 52.8 and the 27B model scored 55.1, and on HumanEval, the 9B model scored 40.2 and the 27B model scored 51.8.

(2) Unparalleled efficiency and cost savings: The Gemma 2 27B model is designed to efficiently run full-precision inference on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or H100 Tensor Core GPU, significantly reducing costs while maintaining high performance. This makes AI deployments more accessible and affordable.

(3) Fast reasoning across hardware: Gemma 2 is optimized to run at incredible speeds on a wide range of hardware, from powerful gaming laptops and high-end desktops to cloud-based setups. Try Gemma 2 in full precision in Google AI Studio, unlock local performance with a quantized version of Gemma.cpp on your CPU, or try it out on NVIDIA RTX or GeForce RTX with Hugging Face Transformers on your home computer.

2. Support commercialization, be compatible with a wide range of frameworks, and facilitate deployment

Built for developers and researchers, Gemma 2 is designed to be easier to integrate into workflows:

(1) Open and accessible: Like the original Gemma model, Gemma 2 is released under Google's commercially friendly Gemma license, allowing developers and researchers to share and commercialize their innovations.

(2) Broad framework compatibility: Gemma 2 is compatible with major AI frameworks such as Hugging Face Transformers, as well as JAX, PyTorch, and TensorFlow via native Keras 3.0, vLLM, Gemma.cpp, Llama.cpp, and Ollama. In addition, Gemma optimizes NVIDIA TensorRT-LLM to run on NVIDIA acceleration infrastructure or as an NVIDIA NIM inference microservice. Users can fine-tune with Keras and Hugging Face. Google is actively working to implement more parameter-efficient fine-tuning options.

(3) Easy deployment: Starting next month, Google Cloud customers will be able to easily deploy and manage Gemma 2 on Vertex AI.

The new Gemma Cookbook is a collection of practical examples and guides that guide users through building their own applications and fine-tuning the Gemma 2 model for specific tasks.

3. Provide responsible AI development resources and rigorously test and evaluate model security

When it comes to responsible AI development, Google provides the resources needed to build and deploy AI responsibly, including a responsible generative AI toolkit. The recently open-sourced LLM Comparator helps developers and researchers evaluate language models in depth.

From now on, users can use the companion Python library to compare and evaluate their own models and data, and visualize the results in their applications. In addition, Google is actively working on an open-source text watermarking technology, SynthID, for use in the Gemma model.

When training Gemma 2, Google followed internal security processes, filtered pre-training data, and rigorously tested and evaluated against a comprehensive set of metrics to identify and mitigate potential biases and risks. Google publishes its results on a large number of public benchmarks related to safety and representativeness hazards.

Google's strongest open-source model, Gemma 2, is released! 27 billion parameter surprise attack Llama 3

▲ Security academic benchmark results of the Gemma 2 IT model and the Gemma 1.1 IT model

Conclusion: The research and development of large models tends to be pragmatic

Google's Gemma 2 research progress reflects the current trend of large model research, which explores the use of lighter, more practical models to achieve stronger performance and ensure easy deployment to better meet different user needs.

Google offers multiple ways for developers and researchers to use these models. Gemma 2 is now available in Google AI Studio to test its full performance with 27 billion parameters without hardware requirements, as well as to download model weights for Gemma 2 from Kaggle and Hugging Face Models, with Vertex AI Model Garden coming soon.

With Gemma 2, Google has demonstrated that distillation is an effective way to train such models, and that training based on output probability is able to produce more effects than pure next token predictions. The model still has limitations, and future research is needed to continuously optimize the factual and adversarial attack robustness, as well as inference and consistency.

To support research and development, Gemma 2 is also available for free through Kaggle, or through the free tier of Colab notebooks. First-time users of Google Cloud services may be eligible for a $300 credit. Academic researchers can apply for the Gemma 2 Academic Research Program to earn Google Cloud credits to accelerate research on Gemma 2. The application deadline is Aug. 9.

Source: Google DeepMind

Read on