Gemma 3 4B Instruction-Tuned with QAT

This model is a 4.3 billion parameter, instruction-tuned version of Google DeepMind's Gemma 3 family, specifically optimized using Quantization Aware Training (QAT). While the checkpoint is unquantized, it's designed to maintain high quality with Q4_0 quantization, significantly reducing memory requirements compared to bfloat16 models.

Key Capabilities

Multimodal Understanding: Handles both text and image inputs (896x896 resolution, 256 tokens per image) and generates text outputs.
Extended Context Window: Features a large 128K token context window, enabling processing of extensive inputs.
Multilingual Support: Trained on data covering over 140 languages.
Diverse Task Performance: Well-suited for a variety of tasks including question answering, summarization, reasoning, and image analysis.
Resource-Efficient Deployment: Its relatively small size and QAT optimization make it suitable for deployment on devices with limited resources like laptops and desktops.

What Makes This Model Different?

This specific model leverages Quantization Aware Training (QAT) to deliver performance comparable to larger, unquantized models while drastically cutting down on memory footprint. It's part of the Gemma 3 family, which are open models built from the same research and technology as Google's Gemini models, offering advanced multimodal capabilities and a large context window in a more accessible package.

Overview

Gemma 3 4B Instruction-Tuned with QAT

Key Capabilities

What Makes This Model Different?

Full Model Card (README)