Gemma-2b is a 2.6 billion parameter, decoder-only large language model developed by Google, built from the same research and technology as the Gemini models. It is designed for text-to-text generation tasks such as question answering, summarization, and reasoning, with a context length of 8192 tokens. Its compact size makes it suitable for deployment in resource-limited environments like laptops, desktops, or personal cloud infrastructure, democratizing access to advanced AI capabilities.
Loading preview...
Overview
Gemma-2b is a 2.6 billion parameter, decoder-only large language model developed by Google, derived from the same research and technology as the Gemini models. It is available in English, with open weights, and comes in both pre-trained and instruction-tuned variants. The model supports a context length of 8192 tokens.
Key Capabilities
- Text Generation: Excels at various text generation tasks including question answering, summarization, and reasoning.
- Resource Efficiency: Its relatively small size allows for deployment in environments with limited resources, such as laptops, desktops, or personal cloud infrastructure.
- Fine-tuning Support: Provides scripts and notebooks for supervised fine-tuning (SFT) using methods like QLoRA and FSDP.
Training Details
The model was trained on a diverse dataset totaling 6 trillion tokens, including web documents, code, and mathematical texts. Data preprocessing involved rigorous CSAM filtering, sensitive data filtering, and additional quality and safety filtering. Training was conducted on Google's latest generation Tensor Processing Unit (TPUv5e) hardware, utilizing JAX and ML Pathways software.
Benchmark Performance
Gemma-2b achieves an average benchmark score of 45.0 across various tasks, including 42.3 on MMLU, 71.4 on HellaSwag, and 22.0 on HumanEval. These metrics highlight its capabilities in reasoning, common sense, and code generation relative to its size.
Intended Usage
- Content Creation: Generating creative text formats, marketing copy, and email drafts.
- Conversational AI: Powering chatbots and virtual assistants.
- Research & Education: Serving as a foundation for NLP research, language learning tools, and knowledge exploration.