google/gemma-4-26B-A4B-it
The google/gemma-4-26B-A4B-it is a 25.2 billion total parameter Mixture-of-Experts (MoE) model from Google DeepMind's Gemma 4 family, featuring 3.8 billion active parameters for efficient inference. This multimodal model processes text, image, and video inputs, generating text outputs, and is optimized for reasoning, coding, and agentic workflows. It supports a 256K token context window and includes native function-calling and system prompt support, making it suitable for complex, long-context tasks.
Loading preview...
Overview
The google/gemma-4-26B-A4B-it is a 25.2 billion total parameter Mixture-of-Experts (MoE) model from Google DeepMind's Gemma 4 family, designed for efficient inference with only 3.8 billion active parameters. It is a multimodal model capable of processing text, image, and video inputs to generate text outputs. The model features a substantial 256K token context window, enabling it to handle complex, long-context tasks effectively. Gemma 4 models are built with a hybrid attention mechanism, combining local sliding window attention with global attention for optimized processing speed and memory footprint.
Key Capabilities
- Multimodality: Processes text, image, and video inputs, with variable aspect ratio and resolution support for images.
- Reasoning: Includes configurable thinking modes for step-by-step reasoning.
- Extended Context: Supports a 256K token context window, ideal for long-form content and complex interactions.
- Coding & Agentic Features: Enhanced performance in coding benchmarks and native function-calling support for autonomous agents.
- System Prompt Support: Introduces native support for the
systemrole, allowing for more structured and controllable conversations. - Efficient Architecture: Utilizes a Mixture-of-Experts (MoE) design, activating a smaller subset of parameters (3.8B) for faster inference while maintaining high performance.
Good For
- Complex Reasoning Tasks: Its built-in reasoning modes and large context window make it suitable for intricate problem-solving.
- Code Generation and Agentic Workflows: Excels in coding benchmarks and supports function calling for building intelligent agents.
- Multimodal Understanding: Ideal for applications requiring the interpretation of interleaved text, images, and video, such as document parsing, UI understanding, and video analysis.
- Fast Inference on Consumer Hardware: The MoE architecture allows for efficient deployment and faster inference compared to dense models of similar total parameter count.