google/gemma-4-26B-A4B-it-qat-q4_0-unquantized

VISIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 29, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The google/gemma-4-26B-A4B-it-qat-q4_0-unquantized model is a 26 billion parameter multimodal Mixture-of-Experts (MoE) language model developed by Google DeepMind, part of the Gemma 4 family. Optimized with Quantization-Aware Training (QAT), it features 3.8 billion active parameters for efficient inference and supports a 256K token context window. This model excels at reasoning, agentic workflows, coding, and multimodal understanding, processing text and image inputs to generate text outputs.

Loading preview...

Overview of Gemma 4 26B A4B MoE

This model is a 26 billion parameter multimodal Mixture-of-Experts (MoE) variant from Google DeepMind's Gemma 4 family, optimized with Quantization-Aware Training (QAT). It's designed to offer high performance with efficient inference, utilizing only 3.8 billion active parameters. The model supports a substantial 256K token context window and is capable of processing both text and image inputs to generate text outputs.

Key Capabilities

  • Multimodal Understanding: Processes text and images, with variable aspect ratio and resolution support. It can handle interleaved multimodal inputs, freely mixing text and images within a single prompt.
  • Reasoning: Features configurable thinking modes, allowing the model to perform step-by-step reasoning before generating an answer.
  • Efficient Architecture: As an MoE model, it activates a smaller subset of parameters (3.8B) during inference, making it faster than dense models of comparable total parameter count.
  • Enhanced Coding & Agentic Capabilities: Shows improved performance in coding benchmarks and includes native function-calling support for autonomous agents.
  • Long Context: Supports a 256K token context window, enabling processing of extensive inputs.

Good For

  • Reasoning-intensive tasks: Benefits from its built-in reasoning mode for complex problem-solving.
  • Agentic workflows: Native function-calling support makes it suitable for building autonomous agents.
  • Coding tasks: Excels in code generation, completion, and correction.
  • Multimodal applications: Ideal for scenarios requiring understanding of both text and image inputs, such as document parsing, UI understanding, and image captioning.
  • Efficient deployment: Its MoE architecture with 3.8B active parameters allows for faster inference compared to larger dense models, making it suitable for consumer GPUs and workstations.