unsloth/gemma-4-26B-A4B

Hugging Face
VISIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 31, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The unsloth/gemma-4-26B-A4B is a 25.2 billion parameter Mixture-of-Experts (MoE) multimodal language model developed by Google DeepMind, featuring 3.8 billion active parameters for efficient inference. It supports text and image inputs with a 256K token context window, excelling in reasoning, coding, and agentic workflows. This model is optimized for fast inference while maintaining strong performance across various benchmarks.

Loading preview...

Overview

unsloth/gemma-4-26B-A4B is a 25.2 billion parameter multimodal Mixture-of-Experts (MoE) model from the Gemma 4 family, developed by Google DeepMind. It is designed for efficient deployment, activating only 3.8 billion parameters during inference, making it significantly faster than its total parameter count suggests. This model supports text and image inputs, with a substantial 256K token context window, and is built for frontier-level performance in its size class.

Key Capabilities

  • Multimodal Understanding: Processes text and image inputs, with variable aspect ratio and resolution support. It can analyze video by processing sequences of frames.
  • Reasoning: Features configurable thinking modes for step-by-step problem-solving.
  • Coding & Agentic Capabilities: Achieves notable improvements in coding benchmarks and includes native function-calling support for autonomous agents.
  • Long Context: Supports a 256K token context window, utilizing a hybrid attention mechanism for efficient processing of long sequences.
  • Efficient Architecture: As an MoE model, it offers fast inference speeds comparable to a 4B parameter model while leveraging a larger total parameter count for performance.

Good For

  • Reasoning-intensive tasks: Its design emphasizes strong reasoning capabilities.
  • Coding and agentic workflows: Enhanced coding benchmarks and function-calling support make it suitable for development and automation.
  • Multimodal applications: Ideal for tasks requiring both text and image understanding, such as document parsing, visual question answering, and video analysis.
  • Deployment on consumer GPUs and workstations: Optimized for scalable deployment in environments beyond mobile devices.