mhenrichsen/gemma-7b

TEXT GENERATIONConcurrency Cost:1Model Size:8.5BQuant:FP8Ctx Length:8kPublished:Feb 21, 2024Architecture:Transformer0.0K Cold

mhenrichsen/gemma-7b is an 8.5 billion parameter, text-to-text, decoder-only large language model reupload from Google's Gemma family, built with the same research and technology as Gemini models. It is designed for a variety of text generation tasks including question answering, summarization, and reasoning. Its relatively small size and open weights make it suitable for deployment in resource-limited environments like laptops or desktops, democratizing access to advanced AI capabilities.

Loading preview...

Overview

mhenrichsen/gemma-7b is a reupload of Google's Gemma 7B base model, part of a family of lightweight, open-weight, text-to-text, decoder-only large language models. Developed by Google, these models are built using the same research and technology as the Gemini models. The 7B variant is pre-trained and available in English, offering capabilities for various text generation tasks.

Key Capabilities

  • Text Generation: Excels at tasks such as question answering, summarization, and reasoning.
  • Resource Efficiency: Its relatively small size (8.5B parameters) allows for deployment on devices with limited resources, including laptops and desktops.
  • Fine-tuning Support: Provides examples and scripts for supervised fine-tuning (SFT) using techniques like QLoRA and FSDP.
  • Hardware Optimization: Trained on Google's latest Tensor Processing Unit (TPUv5e) hardware, leveraging JAX and ML Pathways for efficient training.

Performance Highlights

The 7B model demonstrates strong performance across various benchmarks, including:

  • MMLU: 64.3 (5-shot, top-1)
  • HellaSwag: 81.2 (0-shot)
  • HumanEval: 32.3 (pass@1)
  • GSM8K: 46.4 (maj@1)

Intended Usage

This model is well-suited for:

  • Content creation (poems, scripts, code, marketing copy).
  • Chatbots and conversational AI applications.
  • Text summarization.
  • Natural Language Processing (NLP) research and language learning tools.