Overview
Overview
Gemma 3 is a family of lightweight, state-of-the-art open models developed by Google DeepMind, built using the same research and technology as the Gemini models. The unsloth/gemma-3-12b-it variant is a 12 billion parameter instruction-tuned model featuring a large 128K context window and multilingual support across over 140 languages. It is multimodal, capable of processing both text and image inputs (normalized to 896x896 resolution) to generate text outputs.
Key Capabilities
- Multimodal Understanding: Processes text and images to generate relevant text, suitable for tasks like image analysis and visual data extraction.
- Extensive Context Window: Utilizes a 128K token context window for comprehensive understanding and generation.
- Multilingual Support: Trained on data in over 140 languages, enhancing its global applicability.
- Versatile Text Generation: Excels in tasks such as question answering, summarization, reasoning, creative text generation, and conversational AI.
- Resource-Efficient Deployment: Its relatively small size makes it suitable for deployment on devices with limited resources, including laptops and cloud infrastructure.
Performance Highlights
The Gemma 3 12B model demonstrates strong performance across various benchmarks:
- Reasoning: Achieved 84.2 on HellaSwag (10-shot) and 72.6 on BIG-Bench Hard (few-shot).
- STEM & Code: Scored 74.5 on MMLU (5-shot) and 45.7 on HumanEval (0-shot).
- Multimodal: Achieved 111 on COCOcap and 71.2 on VQAv2.
Good for
- Developers and researchers experimenting with multimodal AI.
- Applications requiring text generation from diverse inputs, including images.
- Deployment in environments where computational resources are a constraint.
- Building multilingual applications and tools.