unsloth/gemma-3-4b-it-qat-int4
The unsloth/gemma-3-4b-it-qat-int4 model is a 4.3 billion parameter instruction-tuned variant of Google DeepMind's multimodal Gemma 3 family, specifically optimized for efficient deployment. This version utilizes Quantization Aware Training (QAT) to maintain quality while significantly reducing memory requirements for int4 quantization. It excels in text generation and image understanding tasks, supporting a 128K context window and over 140 languages, making it suitable for resource-constrained environments.
Loading preview...
Overview
This model is an instruction-tuned 4.3 billion parameter variant from Google DeepMind's Gemma 3 family, designed for efficient deployment through Quantization Aware Training (QAT). It maintains quality comparable to bfloat16 while drastically reducing memory footprint when quantized to int4. Gemma 3 models are multimodal, processing both text and image inputs to generate text outputs, and feature open weights.
Key Capabilities
- Multimodal Input: Handles text strings and images (normalized to 896x896 resolution, encoded to 256 tokens each).
- Large Context Window: Supports a total input context of 128K tokens and an output context of 8192 tokens.
- Multilingual Support: Trained on data including content in over 140 languages.
- Efficient Deployment: QAT enables near bfloat16 quality with int4 quantization, making it suitable for devices with limited resources like laptops or cloud infrastructure.
- Broad Task Suitability: Well-suited for text generation and image understanding tasks, including question answering, summarization, and reasoning.
Training & Evaluation
The 4B model was trained on 4 trillion tokens, encompassing web documents, code, mathematics, and images. Training utilized Google's TPU hardware (TPUv4p, TPUv5p, TPUv5e) and software like JAX and ML Pathways. Evaluation benchmarks cover reasoning, factuality, STEM, code, multilingual capabilities, and multimodal understanding, demonstrating strong performance across various metrics for its size.