unsloth/gemma-4-26B-A4B-it-qat-q4_0-unquantized
The unsloth/gemma-4-26B-A4B-it-qat-q4_0-unquantized model is a 26 billion parameter instruction-tuned multimodal language model from the Gemma 4 family, developed by Google DeepMind. Optimized with Quantization-Aware Training (QAT), this model is designed for efficient deployment while maintaining high quality. It excels in reasoning, coding, and multimodal understanding, processing text and image inputs with a 256K token context window.
Loading preview...
Model Overview
This model is part of the Gemma 4 family, developed by Google DeepMind, featuring a 26 billion parameter Mixture-of-Experts (MoE) architecture. It is optimized with Quantization-Aware Training (QAT) to reduce memory requirements while preserving quality, making it suitable for efficient deployment.
Key Capabilities
- Multimodal Understanding: Processes text and image inputs, with variable aspect ratio and resolution support. Video understanding is also supported by processing frame sequences.
- Reasoning: Designed with configurable thinking modes for step-by-step problem-solving.
- Extended Context Window: Features a 256K token context window for handling long and complex tasks.
- Efficient Architecture: The MoE design activates only 3.8 billion parameters during inference, allowing for faster execution compared to its total parameter count.
- Enhanced Coding & Agentic Capabilities: Shows improvements in coding benchmarks and includes native function-calling support for autonomous agents.
- Multilingual Support: Pre-trained on over 140 languages with out-of-the-box support for 35+ languages.
Good For
- Applications requiring efficient multimodal processing (text and image).
- Reasoning-intensive tasks and agentic workflows.
- Code generation, completion, and correction.
- Deployment on consumer GPUs and workstations where memory efficiency is crucial.