RedHatAI/gemma-4-26B-A4B-it

VISIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 14, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

RedHatAI/gemma-4-26B-A4B-it is an instruction-tuned multimodal language model from the Gemma 4 family, developed by Google DeepMind. This 26 billion parameter Mixture-of-Experts (MoE) model features 3.8 billion active parameters for efficient inference and supports a 256K token context window. It excels in reasoning, coding, and multimodal understanding, processing text, image, and video inputs to generate text outputs.

Loading preview...

Gemma 4 26B A4B-it: Multimodal MoE for Reasoning and Coding

RedHatAI/gemma-4-26B-A4B-it is an instruction-tuned model from the Gemma 4 family, developed by Google DeepMind. This model is a Mixture-of-Experts (MoE) variant with 25.2 billion total parameters, but only 3.8 billion active parameters during inference, allowing for faster execution comparable to a 4B model. It supports a substantial 256K token context window and is designed for multimodal understanding, processing text, image, and video inputs.

Key Capabilities

  • Multimodal Processing: Handles text, image, and video inputs, with variable aspect ratio and resolution support for images.
  • Reasoning: Features configurable thinking modes for step-by-step problem-solving.
  • Efficient Architecture: Utilizes a hybrid attention mechanism and MoE design for optimized performance and memory usage.
  • Enhanced Coding & Agentic Capabilities: Shows significant improvements in coding benchmarks and includes native function-calling support for autonomous agents.
  • Native System Prompt Support: Allows for more structured and controllable conversations.

Good For

  • Reasoning-intensive tasks: Leveraging its built-in thinking mode.
  • Agentic workflows: Utilizing native function-calling support.
  • Coding tasks: Including generation, completion, and correction.
  • Multimodal applications: Integrating text, image, and video understanding for diverse use cases like document parsing, screen understanding, and video analysis.