Model Overview
RedHatAI/gemma-3-12b-it is an instruction-tuned variant of Google DeepMind's Gemma 3 family of open models, built using the same research and technology as the Gemini models. This 12 billion parameter model is multimodal, accepting both text and image inputs (images normalized to 896x896 resolution, encoded to 256 tokens each) and generating text outputs. It boasts a substantial 128K token context window and supports over 140 languages, making it versatile for global applications.
Key Capabilities
- Multimodal Understanding: Processes both text and image inputs for comprehensive analysis.
- Extensive Context Window: Utilizes a 128K token context for detailed and long-form interactions.
- Multilingual Support: Capable of understanding and generating text in over 140 languages.
- Broad Task Performance: Excels in text generation, image understanding, question answering, summarization, and reasoning tasks.
- Resource-Efficient Deployment: Its relatively small size allows for deployment on laptops, desktops, or private cloud infrastructure.
Good For
- Content Creation: Generating various text formats like poems, scripts, marketing copy, and email drafts.
- Conversational AI: Powering chatbots and virtual assistants for customer service or interactive applications.
- Research & Education: Serving as a foundation for VLM/NLP research, language learning tools, and knowledge exploration.
- Image Data Extraction: Interpreting and summarizing visual data for text-based communications.
Performance Highlights
The Gemma 3 12B model demonstrates strong performance across various benchmarks:
- Reasoning: Achieved 84.2 on HellaSwag (10-shot) and 72.6 on BIG-Bench Hard (few-shot).
- STEM & Code: Scored 74.5 on MMLU (5-shot) and 71.0 on GSM8K (8-shot).
- Multilingual: Reached 64.3 on MGSM and 69.4 on Global-MMLU-Lite.
- Multimodal: Achieved 111 on COCOcap and 82.3 on DocVQA (val).
Limitations
Users should be aware of potential limitations related to training data biases, context and task complexity, language ambiguity, factual accuracy (models are not knowledge bases), and common sense reasoning. Google DeepMind emphasizes ethical considerations, including bias mitigation, prevention of misinformation, and ensuring transparency and accountability.