google/gemma-4-26B-A4B-it-qat-q4_0-unquantized
The google/gemma-4-26B-A4B-it-qat-q4_0-unquantized model is a 26 billion parameter multimodal Mixture-of-Experts (MoE) language model developed by Google DeepMind, part of the Gemma 4 family. Optimized with Quantization-Aware Training (QAT), it features 3.8 billion active parameters for efficient inference and supports a 256K token context window. This model excels at reasoning, agentic workflows, coding, and multimodal understanding, processing text and image inputs to generate text outputs.
Loading preview...
Overview of Gemma 4 26B A4B MoE
This model is a 26 billion parameter multimodal Mixture-of-Experts (MoE) variant from Google DeepMind's Gemma 4 family, optimized with Quantization-Aware Training (QAT). It's designed to offer high performance with efficient inference, utilizing only 3.8 billion active parameters. The model supports a substantial 256K token context window and is capable of processing both text and image inputs to generate text outputs.
Key Capabilities
- Multimodal Understanding: Processes text and images, with variable aspect ratio and resolution support. It can handle interleaved multimodal inputs, freely mixing text and images within a single prompt.
- Reasoning: Features configurable thinking modes, allowing the model to perform step-by-step reasoning before generating an answer.
- Efficient Architecture: As an MoE model, it activates a smaller subset of parameters (3.8B) during inference, making it faster than dense models of comparable total parameter count.
- Enhanced Coding & Agentic Capabilities: Shows improved performance in coding benchmarks and includes native function-calling support for autonomous agents.
- Long Context: Supports a 256K token context window, enabling processing of extensive inputs.
Good For
- Reasoning-intensive tasks: Benefits from its built-in reasoning mode for complex problem-solving.
- Agentic workflows: Native function-calling support makes it suitable for building autonomous agents.
- Coding tasks: Excels in code generation, completion, and correction.
- Multimodal applications: Ideal for scenarios requiring understanding of both text and image inputs, such as document parsing, UI understanding, and image captioning.
- Efficient deployment: Its MoE architecture with 3.8B active parameters allows for faster inference compared to larger dense models, making it suitable for consumer GPUs and workstations.