sasa2000/Cosmos-Reason2-2B-heretic

VISIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 28, 2026License:nvidia-open-model-licenseArchitecture:Transformer Open Weights Cold

sasa2000/Cosmos-Reason2-2B-heretic is a 2.4B parameter vision language model (VLM) based on NVIDIA's Cosmos-Reason2-2B, fine-tuned for decensored responses using Heretic v1.2.0. This model excels in physical AI reasoning, spatio-temporal understanding, and object detection with 2D/3D point localization, supporting up to 256K input tokens. It is designed for applications in robotics, video analytics, and data curation, enabling agents to reason with common sense and physics understanding.

Loading preview...

Model Overview

sasa2000/Cosmos-Reason2-2B-heretic is a 2.4 billion parameter Vision Language Model (VLM) derived from NVIDIA's Cosmos-Reason2-2B, specifically modified for decensored outputs using the Heretic v1.2.0 tool. This model is built upon the Qwen3-VL-2B-Instruct architecture and is designed for physical AI and robotics applications, enabling agents to reason with human-like common sense and physics understanding.

Key Capabilities

  • Enhanced Physical AI Reasoning: Improves spatio-temporal understanding and timestamp precision for real-world interactions.
  • Object Detection: Supports 2D/3D point localization and bounding box coordinates with reasoning explanations.
  • Long-Context Understanding: Capable of processing up to 256K input tokens, crucial for complex scenarios.
  • Multimodal Input: Accepts both text and video/image inputs (MP4, JPG).
  • Reduced Refusals: Demonstrates significantly fewer refusals (19/100) compared to the original model (92/100).

Good For

  • Video Analytics AI Agents: Extracting insights and performing root-cause analysis from video data.
  • Data Curation and Annotation: Automating high-quality dataset preparation for physical AI development.
  • Robot Planning and Reasoning: Serving as the 'brain' for deliberate decision-making in embodied agents, interpreting environments and executing tasks with common sense.
  • Applications requiring decensored responses: Due to its modification using Heretic, it offers less restrictive output generation compared to its base model.