google/gemma-4-26B-A4B-it

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:26BQuant:FP8Ctx Length:32kPublished:Mar 11, 2026License:apache-2.0Architecture:Transformer1.0K Open Weights Warm

The google/gemma-4-26B-A4B-it is a 25.2 billion parameter multimodal Mixture-of-Experts (MoE) model developed by Google DeepMind, part of the Gemma 4 family. It processes text and image inputs, generating text outputs, and features a 256K token context window. Optimized for reasoning, coding, and agentic capabilities, this instruction-tuned model is designed for efficient deployment on consumer GPUs and workstations.

Loading preview...

Gemma 4: Multimodal MoE for Advanced Reasoning and Coding

Google DeepMind's Gemma 4 family introduces multimodal models capable of processing text and image inputs (with audio on smaller variants) to generate text outputs. The google/gemma-4-26B-A4B-it is a 25.2 billion parameter instruction-tuned Mixture-of-Experts (MoE) model, featuring 3.8 billion active parameters for fast inference, making it suitable for consumer GPUs and workstations.

Key Capabilities & Advancements

  • Multimodality: Handles Text, Image (variable aspect ratio/resolution), and Video inputs, with native audio support on E2B/E4B models.
  • Reasoning: Designed with configurable thinking modes for highly capable reasoning.
  • Extended Context Window: Supports a 256K token context window.
  • Enhanced Coding & Agentic Capabilities: Achieves significant improvements in coding benchmarks and includes native function-calling support.
  • Native System Prompt Support: Enables more structured and controllable conversations.
  • Multilingual: Pre-trained on over 140 languages, with out-of-the-box support for 35+ languages.

Performance Highlights

The 26B A4B MoE model demonstrates strong performance across various benchmarks, including:

  • MMLU Pro: 82.6%
  • AIME 2026 no tools: 88.3%
  • LiveCodeBench v6: 77.1%
  • GPQA Diamond: 82.3%
  • MMMU Pro (Vision): 73.8%

Ideal Use Cases

This model is well-suited for:

  • Content Creation: Generating creative text formats, marketing copy, and email drafts.
  • Conversational AI: Powering chatbots, virtual assistants, and interactive applications.
  • Coding: Code generation, completion, and correction.
  • Multimodal Understanding: Object detection, document/PDF parsing, UI understanding, chart comprehension, and OCR.
  • Agentic Workflows: Leveraging native function-calling for structured tool use.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p