Name: nvidia/Gemma-4-31B-IT-NVFP4 API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: nvidia

Model Overview

Gemma 4 31B IT is a 30.7 billion parameter multimodal model developed by Google DeepMind, designed for advanced reasoning, agentic workflows, coding, and comprehensive multimodal understanding. It processes text, image, and video inputs, generating text outputs, and supports a substantial 256K-token context window across more than 140 languages. This specific nvidia/Gemma-4-31B-IT-NVFP4 model is quantized with NVIDIA Model Optimizer to NVFP4 data type, making it suitable for efficient inference on NVIDIA GPU-accelerated systems.

Key Capabilities & Features

Multimodal Input: Handles text, image, and video (up to 60 seconds at 1 fps) inputs, with support for variable image aspect ratios and resolutions.
Extended Context: Features a 256K-token input context length, enhanced by a hybrid attention mechanism and Proportional RoPE (p-RoPE) for long-context performance.
Broad Applications: Designed for text generation, chatbots, conversational AI, summarization, image data extraction, reasoning, coding, and function calling.
Quantized Performance: The NVFP4 quantization, achieved with NVIDIA Model Optimizer, maintains high performance as evidenced by evaluation results on benchmarks like GPQA Diamond (85.35%), AIME 2025 (87.60%), and MMLU Pro (84.94%), closely matching BF16 baseline scores.

Ideal Use Cases

This model is well-suited for developers requiring a powerful, multimodal LLM for:

Complex Reasoning Tasks: Excels in scenarios demanding deep understanding and logical inference.
Agentic Workflows: Facilitates the development of intelligent agents capable of interacting with diverse data types.
Code Generation & Understanding: Strong performance in coding benchmarks like LiveCodeBench (82.27% pass@1).
Multilingual Applications: Supports over 140 languages, making it versatile for global deployments.
Efficient Deployment: Optimized for NVIDIA GPUs, offering faster inference times compared to CPU-only solutions.

Overview

Model Overview

Key Capabilities & Features

Ideal Use Cases

Full Model Card (README)