Overview
The google/gemma-4-26B-A4B-it is a 25.2 billion total parameter Mixture-of-Experts (MoE) model from Google DeepMind's Gemma 4 family, designed for efficient inference with only 3.8 billion active parameters. It is a multimodal model capable of processing text, image, and video inputs to generate text outputs. The model features a substantial 256K token context window, enabling it to handle complex, long-context tasks effectively. Gemma 4 models are built with a hybrid attention mechanism, combining local sliding window attention with global attention for optimized processing speed and memory footprint.
Key Capabilities
- Multimodality: Processes text, image, and video inputs, with variable aspect ratio and resolution support for images.
- Reasoning: Includes configurable thinking modes for step-by-step reasoning.
- Extended Context: Supports a 256K token context window, ideal for long-form content and complex interactions.
- Coding & Agentic Features: Enhanced performance in coding benchmarks and native function-calling support for autonomous agents.
- System Prompt Support: Introduces native support for the
system role, allowing for more structured and controllable conversations. - Efficient Architecture: Utilizes a Mixture-of-Experts (MoE) design, activating a smaller subset of parameters (3.8B) for faster inference while maintaining high performance.
Good For
- Complex Reasoning Tasks: Its built-in reasoning modes and large context window make it suitable for intricate problem-solving.
- Code Generation and Agentic Workflows: Excels in coding benchmarks and supports function calling for building intelligent agents.
- Multimodal Understanding: Ideal for applications requiring the interpretation of interleaved text, images, and video, such as document parsing, UI understanding, and video analysis.
- Fast Inference on Consumer Hardware: The MoE architecture allows for efficient deployment and faster inference compared to dense models of similar total parameter count.