Name: nvidia/Cosmos-Reason2-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: nvidia

NVIDIA Cosmos-Reason2-32B: Physical AI and Embodied Reasoning VLM

NVIDIA Cosmos-Reason2-32B is a 32 billion parameter Vision Language Model (VLM) specifically engineered for physical AI and robotics applications. Developed by NVIDIA, this model is built upon the Qwen3-VL-32B-Instruct architecture and is designed to enable robots and AI agents to reason about the physical world with human-like common sense, incorporating prior knowledge and physics understanding.

Key Capabilities and Features

Enhanced Physical AI Reasoning: Features improved spatio-temporal understanding and timestamp precision, crucial for dynamic environments.
Multimodal Input Support: Processes text, video (MP4), and image (JPG) inputs, with a recommended FPS=4 for video to match training.
Object Detection: Supports object detection with 2D/3D point localization and bounding box coordinates, accompanied by reasoning explanations.
Long-Context Understanding: Offers improved long-context processing, supporting up to 256K input tokens.
Commercial Use: The model is released under the NVIDIA Open Model License and is ready for commercial deployment.

Performance and Benchmarks

Cosmos-Reason2-32B demonstrates strong performance across various physical AI benchmarks, often outperforming Qwen3-VL-32B-Instruct in categories such as General (75.85% overall), Robotics (60.60% overall), Self-Driving (70.15% overall), and Smart Spaces (77.79% overall). It shows particular strength in tasks requiring physical common sense and embodied reasoning.

Ideal Use Cases

Robot Planning and Reasoning: Acts as a core component for deliberate decision-making in robot Vision-Language-Action (VLA) models, enabling robots to interpret complex commands and execute tasks with common sense.
Video Analytics AI Agents: Extracts insights and performs root-cause analysis from video data, suitable for city and industrial operations.
Data Curation and Annotation: Automates high-quality curation and annotation of large, diverse training datasets for physical AI development.

Overview

NVIDIA Cosmos-Reason2-32B: Physical AI and Embodied Reasoning VLM

Key Capabilities and Features

Performance and Benchmarks

Ideal Use Cases

Full Model Card (README)