mhenrichsen/gemma-7b
mhenrichsen/gemma-7b is an 8.5 billion parameter, text-to-text, decoder-only large language model reupload from Google's Gemma family, built with the same research and technology as Gemini models. It is designed for a variety of text generation tasks including question answering, summarization, and reasoning. Its relatively small size and open weights make it suitable for deployment in resource-limited environments like laptops or desktops, democratizing access to advanced AI capabilities.
Loading preview...
Overview
mhenrichsen/gemma-7b is a reupload of Google's Gemma 7B base model, part of a family of lightweight, open-weight, text-to-text, decoder-only large language models. Developed by Google, these models are built using the same research and technology as the Gemini models. The 7B variant is pre-trained and available in English, offering capabilities for various text generation tasks.
Key Capabilities
- Text Generation: Excels at tasks such as question answering, summarization, and reasoning.
- Resource Efficiency: Its relatively small size (8.5B parameters) allows for deployment on devices with limited resources, including laptops and desktops.
- Fine-tuning Support: Provides examples and scripts for supervised fine-tuning (SFT) using techniques like QLoRA and FSDP.
- Hardware Optimization: Trained on Google's latest Tensor Processing Unit (TPUv5e) hardware, leveraging JAX and ML Pathways for efficient training.
Performance Highlights
The 7B model demonstrates strong performance across various benchmarks, including:
- MMLU: 64.3 (5-shot, top-1)
- HellaSwag: 81.2 (0-shot)
- HumanEval: 32.3 (pass@1)
- GSM8K: 46.4 (maj@1)
Intended Usage
This model is well-suited for:
- Content creation (poems, scripts, code, marketing copy).
- Chatbots and conversational AI applications.
- Text summarization.
- Natural Language Processing (NLP) research and language learning tools.