ScienceOne-AI/S1-VL-32B-RL
S1-VL-32B-RL is a 33.4 billion parameter multimodal large language model developed by the ScienceOne AI team at the Chinese Academy of Sciences, specifically designed for scientific domains. It natively supports Scientific Reasoning for complex problem-solving and "Thinking with Images" for active code-tool invocation to perform image operations. This model excels in scientific multimodal evaluation benchmarks, particularly in interpreting dense scientific charts, high-resolution imagery, and medical images.
Loading preview...
S1-VL-32B-RL: Scientific Multimodal Reasoning Model
S1-VL-32B-RL, developed by the ScienceOne AI team at the Chinese Academy of Sciences, is a 33.4 billion parameter multimodal large language model optimized for scientific applications. It integrates two core reasoning paradigms: Scientific Reasoning for complex, multi-step problem analysis, and Thinking with Images, which enables the model to invoke code tools for image manipulation (cropping, zooming, enhancement, annotation) during inference.
Key Capabilities
- Scientific Multimodal Reasoning: Achieves state-of-the-art performance across diverse scientific benchmarks including MMMU, MathVision, and VRSBench-MINI, covering mathematics, physics, chemistry, astronomy, earth sciences, and biology.
- Image Operation Reasoning: Ranks first across five benchmarks (HRBench-4K, HRBench-8K, MME-RealWorld-CN, MME-RealWorld-Lite, V*), demonstrating superior ability in high-resolution image understanding and real-world visual reasoning.
- Code Tool Invocation: Can proactively use code to enhance visual information, as demonstrated in case studies involving medical imaging where it crops and magnifies regions of interest for clearer analysis.
- Progressive Post-training: Utilizes a four-stage training procedure, including Scientific Reasoning SFT, Thinking-with-Images Cold-Start SFT, and two stages of Reinforcement Learning (RL) with the SAPO algorithm, to progressively unlock and optimize its scientific reasoning and image operation capabilities.
Good For
- Analyzing and solving complex scientific problems involving both text and images.
- Interpreting dense scientific charts, high-resolution remote sensing, microscopic, and astronomical images.
- Applications requiring dynamic image manipulation during the reasoning process, such as medical image analysis and academic figure Q&A.