unsloth/gemma-3-4b-it-qat
The unsloth/gemma-3-4b-it-qat model is a 4.3 billion parameter instruction-tuned variant of Google DeepMind's Gemma 3 family, utilizing Quantization Aware Training (QAT). This multimodal model processes text and image inputs (896x896 resolution, 256 tokens each) with a 128K context window and generates text outputs. It excels in diverse text generation and image understanding tasks, including question answering, summarization, and reasoning, while being optimized for deployment in resource-limited environments.
Loading preview...
Gemma 3 4B Instruction-Tuned with QAT
This model is a 4.3 billion parameter, instruction-tuned version of Google DeepMind's Gemma 3 family, specifically optimized using Quantization Aware Training (QAT). While the checkpoint is unquantized, it's designed to maintain high quality with Q4_0 quantization, significantly reducing memory requirements compared to bfloat16 models.
Key Capabilities
- Multimodal Understanding: Handles both text and image inputs (896x896 resolution, 256 tokens per image) and generates text outputs.
- Extended Context Window: Features a large 128K token context window, enabling processing of extensive inputs.
- Multilingual Support: Trained on data covering over 140 languages.
- Diverse Task Performance: Well-suited for a variety of tasks including question answering, summarization, reasoning, and image analysis.
- Resource-Efficient Deployment: Its relatively small size and QAT optimization make it suitable for deployment on devices with limited resources like laptops and desktops.
What Makes This Model Different?
This specific model leverages Quantization Aware Training (QAT) to deliver performance comparable to larger, unquantized models while drastically cutting down on memory footprint. It's part of the Gemma 3 family, which are open models built from the same research and technology as Google's Gemini models, offering advanced multimodal capabilities and a large context window in a more accessible package.