nvidia/Gemma-4-31B-IT-NVFP4

Hugging Face
TEXT GENERATIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kPublished:Apr 2, 2026License:apache-license-2.0Architecture:Transformer0.5K Open Weights Warm

The Gemma 4 31B IT model, developed by Google DeepMind, is a 30.7 billion parameter open multimodal model capable of processing text, image, and video inputs to generate text outputs. It features a 256K-token context window and supports over 140 languages, utilizing a hybrid attention mechanism for long-context performance. This NVIDIA-quantized NVFP4 version is optimized for reasoning, agentic workflows, coding, and multimodal understanding on consumer GPUs and workstations.

Loading preview...

Model Overview

Gemma 4 31B IT is a 30.7 billion parameter multimodal model developed by Google DeepMind, designed for advanced reasoning, agentic workflows, coding, and comprehensive multimodal understanding. It processes text, image, and video inputs, generating text outputs, and supports a substantial 256K-token context window across more than 140 languages. This specific nvidia/Gemma-4-31B-IT-NVFP4 model is quantized with NVIDIA Model Optimizer to NVFP4 data type, making it suitable for efficient inference on NVIDIA GPU-accelerated systems.

Key Capabilities & Features

  • Multimodal Input: Handles text, image, and video (up to 60 seconds at 1 fps) inputs, with support for variable image aspect ratios and resolutions.
  • Extended Context: Features a 256K-token input context length, enhanced by a hybrid attention mechanism and Proportional RoPE (p-RoPE) for long-context performance.
  • Broad Applications: Designed for text generation, chatbots, conversational AI, summarization, image data extraction, reasoning, coding, and function calling.
  • Quantized Performance: The NVFP4 quantization, achieved with NVIDIA Model Optimizer, maintains high performance as evidenced by evaluation results on benchmarks like GPQA Diamond (85.35%), AIME 2025 (87.60%), and MMLU Pro (84.94%), closely matching BF16 baseline scores.

Ideal Use Cases

This model is well-suited for developers requiring a powerful, multimodal LLM for:

  • Complex Reasoning Tasks: Excels in scenarios demanding deep understanding and logical inference.
  • Agentic Workflows: Facilitates the development of intelligent agents capable of interacting with diverse data types.
  • Code Generation & Understanding: Strong performance in coding benchmarks like LiveCodeBench (82.27% pass@1).
  • Multilingual Applications: Supports over 140 languages, making it versatile for global deployments.
  • Efficient Deployment: Optimized for NVIDIA GPUs, offering faster inference times compared to CPU-only solutions.