unsloth/gemma-4-31B

Hugging Face
VISIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 31, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The unsloth/gemma-4-31B is a 30.7 billion parameter multimodal language model developed by Google DeepMind, part of the Gemma 4 family. This dense model supports text and image input with a 256K token context window, excelling in reasoning, coding, and agentic workflows. It features a hybrid attention mechanism for efficient long-context processing and native system prompt support.

Loading preview...

unsloth/gemma-4-31B: A Multimodal Powerhouse

This model is the 31 billion parameter dense variant from Google DeepMind's Gemma 4 family, designed for frontier-level performance on consumer GPUs and workstations. It is a multimodal model, capable of processing both text and image inputs, and generating text outputs. A key architectural innovation is its hybrid attention mechanism, combining local sliding window attention with global attention, optimized for speed and low memory footprint while handling complex, long-context tasks up to 256K tokens.

Key Capabilities

  • Multimodal Understanding: Processes text and images, with variable aspect ratio and resolution support. Video analysis is also supported by processing sequences of frames.
  • Advanced Reasoning: Designed as a highly capable reasoner with configurable thinking modes, allowing step-by-step internal reasoning.
  • Enhanced Coding & Agentic Workflows: Shows significant improvements in coding benchmarks and includes native function-calling support for autonomous agents.
  • Long Context: Features a substantial 256K token context window, enabling deep awareness for complex tasks.
  • Native System Prompt Support: Introduces native support for the system role, facilitating more structured and controllable conversations.

Good For

  • Complex Reasoning Tasks: Leverage its advanced reasoning capabilities and configurable thinking modes.
  • Multimodal Applications: Ideal for scenarios requiring both text and image understanding, such as document parsing, visual QA, or content creation involving visual elements.
  • Code Generation and Agent Development: Its strong performance in coding benchmarks and function-calling support make it suitable for programming assistance and building intelligent agents.
  • Long-Context Processing: Efficiently handles extensive inputs, beneficial for summarizing large documents or engaging in prolonged conversations.