google/gemma-4-26B-A4B-it

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 11, 2026License:apache-2.0Architecture:Transformer1.1K Open Weights Warm

Gemma 4 26B A4B-it is a 25.2 billion parameter multimodal Mixture-of-Experts (MoE) model developed by Google DeepMind, part of the Gemma 4 family. It features 3.8 billion active parameters for efficient inference, a 256K token context window, and supports text, image, and video inputs. This instruction-tuned variant excels in reasoning, coding, and agentic workflows, offering enhanced capabilities over previous Gemma models.

Loading preview...

Gemma 4 26B A4B-it Overview

This model is a 25.2 billion parameter instruction-tuned Mixture-of-Experts (MoE) variant from Google DeepMind's Gemma 4 family. It stands out with only 3.8 billion active parameters during inference, making it highly efficient while delivering strong performance. The model supports a substantial 256K token context window and is multimodal, processing text, image, and video inputs to generate text outputs. It is designed with configurable thinking modes for enhanced reasoning and includes native function-calling support for agentic workflows.

Key Capabilities

  • Multimodality: Processes text, image (with variable aspect ratio and resolution), and video inputs. The E2B and E4B models also support audio.
  • Efficient Architecture: Utilizes a Mixture-of-Experts (MoE) design with 3.8B active parameters for fast inference, comparable to a 4B-parameter model.
  • Extended Context: Features a 256K token context window, enabling complex, long-context tasks.
  • Enhanced Reasoning & Coding: Designed for strong reasoning capabilities and achieves notable improvements in coding benchmarks, including native function-calling.
  • Multilingual Support: Pre-trained on over 140 languages with out-of-the-box support for 35+ languages.
  • Native System Prompt Support: Allows for more structured and controllable conversations.

Good For

  • Reasoning-intensive tasks: Benefits from configurable thinking modes.
  • Agentic workflows: Leverages native function-calling support.
  • Code generation and understanding: Shows improved performance in coding benchmarks.
  • Multimodal applications: Handles interleaved text, image, and video inputs effectively.
  • Deployment on consumer GPUs and workstations: Optimized for scalable deployment with efficient inference.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p