S1-VL-32B: Scientific Multimodal Reasoning Model

S1-VL-32B, developed by the ScienceOne AI team at the Chinese Academy of Sciences, is a 33.4 billion parameter multimodal large language model optimized for scientific domains. It introduces two core reasoning paradigms to tackle complex scientific challenges:

Key Capabilities

Scientific Reasoning: Utilizes chain-of-thought processes for analyzing and solving intricate, multi-step scientific problems across disciplines like mathematics, physics, chemistry, astronomy, earth sciences, and biology.
Thinking with Images: Uniquely enables the model to actively invoke code tools for image operations (e.g., cropping, zooming, enhancement, annotation) during its reasoning process. This is particularly effective for interpreting dense scientific charts, high-resolution remote sensing, microscopic, and astronomical images.
Cross-disciplinary Data Pipeline: Employs a robust data processing pipeline to ensure high-quality visual reasoning trajectories for training.
Four-stage Progressive Post-training: A specialized training procedure, including Scientific Reasoning SFT, Thinking-with-Images Cold-Start SFT, and two stages of Reinforcement Learning (RL) using the SAPO algorithm, progressively unlocks and refines its scientific reasoning and image manipulation abilities.

Evaluation Highlights

S1-VL-32B demonstrates strong performance across 13 benchmarks in scientific multimodal reasoning and image manipulation reasoning. It shows significant advantages on authoritative benchmarks like MMMU, MathVision, and VRSBench-MINI, surpassing its base model Qwen3-VL-32B and remaining competitive with much larger open-source models and even closed-source flagship models like Gemini 2.5 Pro and GPT-5. Notably, it ranks first across all five image operation reasoning benchmarks, outperforming models of comparable and larger scales, as well as dedicated "Thinking with Images" models.

Good for

Academic figure Q&A
Medical image analysis
Chemical structure recognition
Interpreting dense scientific charts and high-resolution imagery
Tasks requiring image manipulation (cropping, zooming, enhancement) as part of the reasoning process

Overview

S1-VL-32B: Scientific Multimodal Reasoning Model

Key Capabilities

Evaluation Highlights

Good for

Full Model Card (README)