Overview
Gemma-7b is a 8.5 billion parameter, text-to-text, decoder-only large language model developed by Google, leveraging the same research and technology as the Gemini models. It is designed for a wide range of text generation tasks and is available with open weights, making it accessible for various applications and research. The model supports a context length of 8192 tokens, allowing it to process substantial amounts of information.
Key Capabilities
- Versatile Text Generation: Excels in tasks such as question answering, summarization, and reasoning.
- Resource-Efficient Deployment: Its relatively small size enables deployment on devices with limited resources, including laptops, desktops, and personal cloud infrastructure.
- Fine-tuning Support: Provides scripts and notebooks for supervised fine-tuning (SFT) using techniques like QLoRA and FSDP, with examples for various datasets.
- Optimized Performance: Can be run on CPUs, single/multi-GPUs, and supports different precisions (float16, bfloat16) and quantization (8-bit, 4-bit) for enhanced efficiency. It also supports Flash Attention 2 for faster processing.
Benchmarks and Performance
Gemma-7b demonstrates strong performance across various benchmarks, with an average score of 56.9 across a diverse set of evaluation metrics. Notable scores include 64.3 on MMLU, 81.2 on HellaSwag, and 46.4 on GSM8K, indicating its capabilities in reasoning, common sense, and mathematical problem-solving. The model was trained on a diverse dataset of 6 trillion tokens, including web documents, code, and mathematical texts, and was developed using Google's TPUv5e hardware and JAX/ML Pathways software for efficient training.