sensenova/SenseNova-SI-1.3-Qwen3-VL-8B

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 10, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

SenseNova-SI-1.3-Qwen3-VL-8B is an 8 billion parameter multimodal foundation model developed by SenseNova, built upon the Qwen3-VL architecture with a 32768 token context length. It is specifically designed and extensively trained on 14 million diverse data samples to enhance spatial intelligence capabilities, demonstrating unprecedented performance across a broad range of spatial intelligence benchmarks. This model excels at complex spatial reasoning tasks and open-ended spatial question-answering, making it suitable for applications requiring advanced visual-spatial understanding.

Loading preview...

SenseNova-SI-1.3-Qwen3-VL-8B: Enhanced Spatial Intelligence Multimodal Model

SenseNova-SI-1.3-Qwen3-VL-8B is a multimodal foundation model from the SenseNova-SI family, developed by SenseNova. It is built upon the Qwen3-VL architecture and features 8 billion parameters with a 32768 token context length. This model is specifically designed to address deficiencies in spatial intelligence often found in other multimodal models.

Key Capabilities and Differentiators

  • Superior Spatial Intelligence: The model is systematically trained on 14 million diverse data samples (SenseNova-SI-8M) under a rigorous taxonomy of spatial capabilities, leading to significantly enhanced spatial reasoning.
  • Benchmark Performance: It achieves unprecedented performance across a broad range of spatial intelligence benchmarks, including VSI (67.8), MMSI (39.5), MindCube-Tiny (68.3), ViewSpatial (55.8), SITE (57.5), BLINK (63.0), 3DSRBench (57.3), and EmbSpatial-Bench (82.1). These scores often surpass other open-source models in its size class and even some proprietary models.
  • Enhanced Open-Ended QA: Compared to previous versions, this model shows improved capabilities in open-ended spatial question-answering.
  • Robust Multimodal Understanding: While specializing in spatial intelligence, it maintains strong general multimodal understanding.

Should You Use This Model?

This model is ideal for use cases requiring advanced spatial reasoning and visual understanding. Consider SenseNova-SI-1.3-Qwen3-VL-8B if your application involves:

  • Complex Spatial Analysis: Tasks that demand precise understanding of object relationships, positions, and movements in 2D and 3D space.
  • Robotics and Navigation: Applications where understanding the physical environment and spatial relationships is critical.
  • Detailed Image Interpretation: Scenarios requiring more than just object recognition, but also spatial context and reasoning from visual inputs.
  • Benchmarking Spatial AI: Researchers and developers focused on evaluating and advancing spatial intelligence in AI models.