ibyteohdear/gemma-3-12b-it-qat-q4_0-unquantized
The ibyteohdear/gemma-3-12b-it-qat-q4_0-unquantized model is a 12 billion parameter instruction-tuned variant of Google DeepMind's Gemma 3 family, built from the same research as Gemini models. This multimodal model handles text and image inputs (896x896 resolution, 256 tokens each) with a 32K token context window, generating text outputs. It is specifically designed for quantization-aware training (QAT) to maintain bfloat16 quality while significantly reducing memory requirements, making it suitable for text generation, image understanding, question answering, summarization, and reasoning tasks on resource-limited devices.
Loading preview...
Gemma 3 12B Instruction-Tuned (QAT)
This model is a 12 billion parameter, instruction-tuned variant from Google DeepMind's Gemma 3 family, leveraging the same research and technology as the Gemini models. It is notable for its Quantization Aware Training (QAT), which allows it to preserve bfloat16 quality while drastically reducing memory footprint, making it efficient for deployment.
Key Capabilities
- Multimodal: Processes both text and image inputs (images normalized to 896x896 resolution, encoded to 256 tokens each).
- Extensive Context: Supports a total input context of 32K tokens.
- Multilingual Support: Trained on data including over 140 languages.
- Diverse Task Performance: Excels in text generation, image understanding, question answering, summarization, and reasoning.
- Optimized for Efficiency: QAT enables deployment in environments with limited resources like laptops or desktops.
Training and Performance
The 12B model was trained on 12 trillion tokens, encompassing web documents, code, mathematics, and images. It demonstrates strong performance across various benchmarks, including:
- Reasoning: Achieves 84.2 on HellaSwag (10-shot) and 72.6 on BIG-Bench Hard (few-shot).
- STEM & Code: Scores 74.5 on MMLU (5-shot) and 45.7 on HumanEval (0-shot).
- Multilingual: Reaches 64.3 on MGSM and 69.4 on Global-MMLU-Lite.
- Multimodal: Achieves 71.2 on VQAv2 and 50.3 on MMMU.
Good for
- Applications requiring efficient multimodal processing (text and image).
- Deploying powerful language models on resource-constrained devices due to QAT optimization.
- Tasks involving multilingual text generation and understanding.
- Developing solutions for question answering, summarization, and complex reasoning.