Overview

This model is an instruction-tuned 12 billion parameter variant of Google DeepMind's Gemma 3 family, specifically designed with Quantization Aware Training (QAT). While the provided checkpoint is unquantized, it's intended for quantization to Q4_0, allowing it to maintain quality similar to bfloat16 with significantly reduced memory requirements. Gemma 3 models are multimodal, processing both text and image inputs to generate text outputs, and feature a substantial 128K context window.

Key Capabilities

Multimodal Processing: Handles text inputs (questions, prompts, documents) and images (normalized to 896x896 resolution, encoded to 256 tokens each).
Extensive Context: Supports a total input context of 128K tokens and generates outputs up to 8192 tokens.
Multilingual Support: Trained on data including content in over 140 languages.
Broad Task Performance: Excels in text generation, image understanding, question answering, summarization, and reasoning.
Efficient Deployment: QAT enables deployment in resource-limited environments like laptops, desktops, or private cloud infrastructure.

Good for

Content Creation: Generating creative text formats, marketing copy, and email drafts.
Conversational AI: Powering chatbots, virtual assistants, and interactive applications.
Information Extraction: Summarizing text corpora and extracting/interpreting visual data.
Research & Education: Serving as a foundation for VLM/NLP research, language learning tools, and knowledge exploration.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)