google/gemma-4-31B

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kPublished:Mar 12, 2026License:apache-2.0Architecture:Transformer0.3K Open Weights Warm

Gemma 4 31B is a 30.7 billion parameter multimodal large language model developed by Google DeepMind, part of the Gemma 4 family. This dense architecture model handles text and image inputs, generating text outputs, and features a 256K token context window. It is optimized for advanced reasoning, coding, and agentic capabilities, making it suitable for complex tasks on consumer GPUs and workstations.

Loading preview...

Overview

Google DeepMind's Gemma 4 models are a family of open-weight, multimodal LLMs, with the 31B variant being a dense architecture model. These models are designed for text and image input (with audio support on smaller E2B/E4B models) and text output, featuring a substantial context window of up to 256K tokens. Gemma 4 introduces significant advancements in reasoning, extended multimodalities, and enhanced coding and agentic capabilities, including native function-calling support and system prompt integration.

Key Capabilities

  • Multimodal Understanding: Processes text, images (with variable aspect ratio and resolution), and video. The E2B and E4B models also natively support audio.
  • Advanced Reasoning: All models are designed as highly capable reasoners with configurable thinking modes.
  • Extended Context Window: Supports up to 256K tokens for the 26B A4B and 31B models, and 128K for smaller models.
  • Enhanced Coding & Agentic Features: Improved coding benchmarks and native function-calling for autonomous agents.
  • Multilingual Support: Pre-trained on over 140 languages with out-of-the-box support for 35+ languages.

Good For

  • Complex Reasoning Tasks: Leveraging its built-in reasoning mode for step-by-step problem-solving.
  • Multimodal Applications: Integrating text and image inputs for tasks like object detection, document parsing, and UI understanding.
  • Coding and Agentic Workflows: Generating, completing, and correcting code, and powering autonomous agents with function-calling.
  • Long-Context Applications: Handling extensive documents or conversations due to its large context window.
  • Research and Development: Serving as a foundation for VLM and NLP research, and developing advanced AI applications.