RedHatAI/gemma-4-31B-it

VISIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 18, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Gemma 4 is a family of open multimodal models developed by Google DeepMind, including this 31 billion parameter instruction-tuned variant. These models handle text and image input, generating text output, with smaller variants also supporting audio. Noteworthy for their advanced reasoning, extended multimodality, and enhanced coding capabilities, Gemma 4 models are designed for diverse deployments from mobile to servers, excelling in agentic workflows and long-context tasks up to 256K tokens.

Loading preview...

Gemma 4: Multimodal Models by Google DeepMind

Gemma 4 is a family of open multimodal models from Google DeepMind, offering both dense and Mixture-of-Experts (MoE) architectures. This release includes pre-trained and instruction-tuned variants, with the 31B model being a dense instruction-tuned version. These models are designed for text and image input, generating text output, while smaller E2B and E4B models also natively support audio.

Key Capabilities & Advancements

  • Multimodality: Processes text, image (with variable aspect ratio and resolution), and video across all models. E2B and E4B models additionally support audio.
  • Reasoning: Features configurable thinking modes for step-by-step reasoning.
  • Extended Context Window: Supports up to 256K tokens for medium models (including 31B) and 128K for smaller models.
  • Enhanced Coding & Agentic Capabilities: Demonstrates significant improvements in coding benchmarks and includes native function-calling support for autonomous agents.
  • Native System Prompt Support: Integrates a system role for more structured and controllable conversations.
  • Multilingual Support: Pre-trained on over 140 languages with out-of-the-box support for 35+ languages.

Performance Highlights

The Gemma 4 31B model shows strong performance across various benchmarks, including:

  • MMLU Pro: 85.2%
  • AIME 2026 no tools: 89.2%
  • LiveCodeBench v6: 80.0%
  • GPQA Diamond: 84.3%
  • MMMU Pro (Vision): 76.9%
  • MATH-Vision: 85.6%
  • Long Context (MRCR v2 8 needle 128k): 66.4%

Intended Usage

Gemma 4 models are well-suited for a wide range of applications:

  • Content Creation: Text generation, chatbots, summarization, image data extraction.
  • Research & Education: NLP and VLM research, language learning tools, knowledge exploration.
  • Agentic Workflows: Leveraging function calling for structured tool use.
  • Coding: Code generation, completion, and correction.