google/gemma-4-26B-A4B

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 12, 2026License:apache-2.0Architecture:Transformer0.3K Open Weights Warm

Gemma 4 26B A4B is a multimodal Mixture-of-Experts (MoE) model developed by Google DeepMind, part of the Gemma 4 family. It features 25.2 billion total parameters with 3.8 billion active parameters, supporting text and image input with a 256K token context window. This model is optimized for fast inference, excelling in reasoning, coding, and agentic workflows, making it suitable for consumer GPUs and workstations.

Loading preview...

Overview

Google DeepMind's Gemma 4 26B A4B is a multimodal Mixture-of-Experts (MoE) model, part of the Gemma 4 family, designed for frontier-level performance. It processes text and image inputs, with a focus on efficient deployment on consumer GPUs and workstations. The model features a hybrid attention mechanism for speed and long-context awareness, and supports a substantial 256K token context window.

Key Capabilities

  • Multimodal Input: Handles text and image inputs, with variable aspect ratio and resolution support.
  • Efficient Architecture: Utilizes a Mixture-of-Experts (MoE) design with 25.2 billion total parameters and 3.8 billion active parameters, enabling fast inference.
  • Reasoning & Coding: Designed as a highly capable reasoner with configurable thinking modes and enhanced coding benchmarks, including native function-calling support.
  • Extended Context: Supports a 256K token context window, crucial for complex, long-context tasks.
  • Native System Prompt Support: Introduces native support for the system role for structured conversations.

Good For

  • Reasoning and Agentic Workflows: Its strong reasoning capabilities and function-calling support make it ideal for autonomous agents.
  • Coding Tasks: Excels in code generation, completion, and correction.
  • Multimodal Understanding: Suitable for applications requiring both text and image processing, such as object detection, document parsing, and UI understanding.
  • Fast Inference: The MoE architecture allows it to run almost as fast as a 4B-parameter model, making it efficient for deployment.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p