Name: google/gemma-4-26B-A4B-it-qat-q4_0-unquantized API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: google

Gemma 4 26B A4B MoE: Multimodal, Efficient, and Reasoning-Capable

This model is a Gemma 4 family member developed by Google DeepMind, featuring a Mixture-of-Experts (MoE) architecture. It is optimized with Quantization-Aware Training (QAT), allowing for significantly reduced memory requirements while maintaining quality. The model processes text and image inputs, generating text outputs, and supports a 256K token context window.

Key Capabilities

Multimodal: Processes text and image inputs, with variable aspect ratio and resolution support for images. It can analyze video by processing sequences of frames.
Efficient Architecture: As a 25.2 billion total parameter MoE model, it activates only 3.8 billion parameters during inference, providing performance comparable to a 4B parameter model.
Reasoning: Designed with configurable thinking modes for step-by-step reasoning.
Enhanced Coding & Agentic Capabilities: Achieves notable improvements in coding benchmarks and includes native function-calling support for autonomous agents.
Multilingual: Supports over 140 languages in pre-training and 35+ languages out-of-the-box.
Native System Prompt Support: Allows for more structured and controllable conversations.

Good For

Fast Inference: Ideal for scenarios requiring efficient processing due to its MoE architecture's low active parameter count.
Complex Reasoning Tasks: Benefits from its built-in reasoning mode and strong performance on benchmarks like MMLU Pro and AIME 2026.
Multimodal Applications: Suitable for tasks involving interleaved text and image inputs, such as object detection, document parsing, and UI understanding.
Coding and Agentic Workflows: Excels in code generation, completion, correction, and structured tool use.

Overview

Gemma 4 26B A4B MoE: Multimodal, Efficient, and Reasoning-Capable

Key Capabilities

Good For

Full Model Card (README)