Gemma-2b: A Lightweight, Open Model from Google
Gemma-2b is a 2.6 billion parameter, decoder-only large language model developed by Google, leveraging the same research and technology as the Gemini models. It is designed for English text-to-text generation, offering open weights and both pre-trained and instruction-tuned variants.
Key Capabilities
- Versatile Text Generation: Proficient in tasks such as question answering, summarization, and reasoning.
- Resource-Efficient Deployment: Its relatively small size allows for deployment on devices with limited resources, including laptops, desktops, or personal cloud infrastructure.
- Robust Training: Trained on a diverse 6 trillion token dataset comprising web documents, code, and mathematical texts, enhancing its ability to handle various tasks and formats.
- Responsible AI Focus: Incorporates rigorous data filtering for CSAM and sensitive information, alongside internal red-teaming and evaluations for ethics and safety.
Benchmark Performance Highlights
- Achieves 42.3 on MMLU (5-shot, top-1) and 71.4 on HellaSwag (0-shot).
- Scores 22.0 on HumanEval (pass@1) and 17.7 on GSM8K (maj@1).
Good for
- Content Creation: Generating creative text formats, marketing copy, and email drafts.
- Conversational AI: Powering chatbots and virtual assistants.
- Research and Education: Serving as a foundation for NLP research, language learning tools, and knowledge exploration.
- Fine-tuning: Scripts and notebooks are provided for Supervised Fine-Tuning (SFT) using QLoRA or FSDP on various datasets.