unsloth/gemma-3-12b-it-qat

Warm
Public
Vision
12B
FP8
32768
Apr 21, 2025
License: gemma
Hugging Face
Overview

Gemma 3 12B Instruction-Tuned with QAT

This model is the 12 billion parameter instruction-tuned version of Google DeepMind's Gemma 3, enhanced with Quantization Aware Training (QAT). The Gemma 3 family are multimodal models, capable of processing both text and image inputs (normalized to 896x896 resolution, encoded to 256 tokens each) and generating text outputs. This particular checkpoint is unquantized but designed to preserve bfloat16 quality when quantized to Q4_0, significantly reducing memory footprint for deployment.

Key Capabilities

  • Multimodal Understanding: Handles text and image inputs, generating text responses, analysis of image content, or summaries.
  • Extended Context Window: Features a large 128K token input context window, allowing for processing of extensive documents and complex prompts.
  • Multilingual Support: Trained on data in over 140 languages, enabling broad linguistic application.
  • Resource-Efficient Deployment: QAT enables near bfloat16 performance with reduced memory, suitable for laptops, desktops, or private cloud infrastructure.
  • Diverse Task Performance: Well-suited for question answering, summarization, reasoning, and content creation.

Performance Highlights (Original Checkpoint)

Evaluations on the original Gemma 3 12B model demonstrate strong performance across various benchmarks:

  • Reasoning & Factuality: Achieved 84.2 on HellaSwag (10-shot), 78.8 on BoolQ (0-shot), and 72.6 on BIG-Bench Hard (few-shot).
  • STEM & Code: Scored 74.5 on MMLU (5-shot), 71.0 on GSM8K (8-shot), and 45.7 on HumanEval (0-shot).
  • Multimodal: Demonstrated capabilities with 111 on COCOcap and 75.2 on AI2D.

Intended Usage

This model is designed for a wide range of applications including content creation (poems, scripts, marketing copy), chatbots, text summarization, image data extraction, and research in NLP and VLM. Its smaller size and QAT optimization make it accessible for developers seeking state-of-the-art AI models with reduced resource demands.