unsloth/gemma-4-31B-it-qat-q4_0-unquantized
The unsloth/gemma-4-31B-it-qat-q4_0-unquantized model is a 31 billion parameter instruction-tuned multimodal language model from Google DeepMind's Gemma 4 family, optimized with Quantization-Aware Training (QAT) for efficient deployment. It features a 32768-token context window and excels in reasoning, coding, and multimodal understanding, processing text and image inputs to generate text outputs. This specific unquantized QAT checkpoint is ideal for custom downstream compilation and research, offering high performance while reducing memory requirements.
Loading preview...
Gemma 4 31B QAT Unquantized Overview
This model is part of the Gemma 4 family by Google DeepMind, a multimodal LLM designed for text and image input with text output. It's an instruction-tuned variant, specifically an unquantized QAT (Quantization-Aware Training) checkpoint, which allows for near bfloat16 quality with significantly reduced memory footprint, making it suitable for custom compilation and research.
Key Capabilities
- Multimodal Understanding: Processes text and image inputs, with support for variable aspect ratios and resolutions. The 31B model includes a ~550M parameter vision encoder.
- Reasoning: Designed with highly capable reasoning abilities, including configurable thinking modes.
- Extended Context Window: Supports a substantial 256K token context length, enabling complex, long-context tasks.
- Enhanced Coding & Agentic Capabilities: Shows improvements in coding benchmarks and features native function-calling support for autonomous agents.
- Multilingual Support: Pre-trained on over 140 languages, with out-of-the-box support for 35+ languages.
- Native System Prompt Support: Integrates native support for the
systemrole for more structured conversations.
Performance Highlights
The Gemma 4 31B model demonstrates strong performance across various benchmarks:
- MMLU Pro: 85.2%
- AIME 2026 no tools: 89.2%
- LiveCodeBench v6: 80.0%
- GPQA Diamond: 84.3%
- MMMU Pro (Vision): 76.9%
Good For
This model is well-suited for applications requiring advanced reasoning, coding, and multimodal understanding, particularly where memory efficiency and high performance are critical. Its unquantized QAT format makes it an excellent choice for developers and researchers looking to build custom solutions or conduct further optimization.