unsloth/gemma-4-26B-A4B-it-qat-q4_0-unquantized

VISIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 5, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The unsloth/gemma-4-26B-A4B-it-qat-q4_0-unquantized model is a 26 billion parameter instruction-tuned multimodal language model from the Gemma 4 family, developed by Google DeepMind. Optimized with Quantization-Aware Training (QAT), this model is designed for efficient deployment while maintaining high quality. It excels in reasoning, coding, and multimodal understanding, processing text and image inputs with a 256K token context window.

Loading preview...

Model Overview

This model is part of the Gemma 4 family, developed by Google DeepMind, featuring a 26 billion parameter Mixture-of-Experts (MoE) architecture. It is optimized with Quantization-Aware Training (QAT) to reduce memory requirements while preserving quality, making it suitable for efficient deployment.

Key Capabilities

  • Multimodal Understanding: Processes text and image inputs, with variable aspect ratio and resolution support. Video understanding is also supported by processing frame sequences.
  • Reasoning: Designed with configurable thinking modes for step-by-step problem-solving.
  • Extended Context Window: Features a 256K token context window for handling long and complex tasks.
  • Efficient Architecture: The MoE design activates only 3.8 billion parameters during inference, allowing for faster execution compared to its total parameter count.
  • Enhanced Coding & Agentic Capabilities: Shows improvements in coding benchmarks and includes native function-calling support for autonomous agents.
  • Multilingual Support: Pre-trained on over 140 languages with out-of-the-box support for 35+ languages.

Good For

  • Applications requiring efficient multimodal processing (text and image).
  • Reasoning-intensive tasks and agentic workflows.
  • Code generation, completion, and correction.
  • Deployment on consumer GPUs and workstations where memory efficiency is crucial.