Gemma 4 31B QAT Unquantized Overview

This model is part of the Gemma 4 family by Google DeepMind, a multimodal LLM designed for text and image input with text output. It's an instruction-tuned variant, specifically an unquantized QAT (Quantization-Aware Training) checkpoint, which allows for near bfloat16 quality with significantly reduced memory footprint, making it suitable for custom compilation and research.

Key Capabilities

Multimodal Understanding: Processes text and image inputs, with support for variable aspect ratios and resolutions. The 31B model includes a ~550M parameter vision encoder.
Reasoning: Designed with highly capable reasoning abilities, including configurable thinking modes.
Extended Context Window: Supports a substantial 256K token context length, enabling complex, long-context tasks.
Enhanced Coding & Agentic Capabilities: Shows improvements in coding benchmarks and features native function-calling support for autonomous agents.
Multilingual Support: Pre-trained on over 140 languages, with out-of-the-box support for 35+ languages.
Native System Prompt Support: Integrates native support for the system role for more structured conversations.

Performance Highlights

The Gemma 4 31B model demonstrates strong performance across various benchmarks:

MMLU Pro: 85.2%
AIME 2026 no tools: 89.2%
LiveCodeBench v6: 80.0%
GPQA Diamond: 84.3%
MMMU Pro (Vision): 76.9%

Good For

This model is well-suited for applications requiring advanced reasoning, coding, and multimodal understanding, particularly where memory efficiency and high performance are critical. Its unquantized QAT format makes it an excellent choice for developers and researchers looking to build custom solutions or conduct further optimization.

Overview

Gemma 4 31B QAT Unquantized Overview

Key Capabilities

Performance Highlights

Good For

Full Model Card (README)