google/gemma-7b

Warm
Public
8.5B
FP8
8192
Feb 8, 2024
License: gemma
Hugging Face
Gated
Overview

Overview

Gemma-7b is a 8.5 billion parameter, text-to-text, decoder-only large language model developed by Google, leveraging the same research and technology as the Gemini models. It is designed for a wide range of text generation tasks and is available with open weights, making it accessible for various applications and research. The model supports a context length of 8192 tokens, allowing it to process substantial amounts of information.

Key Capabilities

  • Versatile Text Generation: Excels in tasks such as question answering, summarization, and reasoning.
  • Resource-Efficient Deployment: Its relatively small size enables deployment on devices with limited resources, including laptops, desktops, and personal cloud infrastructure.
  • Fine-tuning Support: Provides scripts and notebooks for supervised fine-tuning (SFT) using techniques like QLoRA and FSDP, with examples for various datasets.
  • Optimized Performance: Can be run on CPUs, single/multi-GPUs, and supports different precisions (float16, bfloat16) and quantization (8-bit, 4-bit) for enhanced efficiency. It also supports Flash Attention 2 for faster processing.

Benchmarks and Performance

Gemma-7b demonstrates strong performance across various benchmarks, with an average score of 56.9 across a diverse set of evaluation metrics. Notable scores include 64.3 on MMLU, 81.2 on HellaSwag, and 46.4 on GSM8K, indicating its capabilities in reasoning, common sense, and mathematical problem-solving. The model was trained on a diverse dataset of 6 trillion tokens, including web documents, code, and mathematical texts, and was developed using Google's TPUv5e hardware and JAX/ML Pathways software for efficient training.