lightonai/OriOn-Qwen-SR1
The lightonai/OriOn-Qwen-SR1 is a 33.4 billion parameter vision-language model developed by LightOn, built upon the Qwen3-VL-32B-Instruct architecture. It achieves state-of-the-art performance on MMLongBenchDoc (58.3 accuracy) for long-document visual question answering by internalizing synthetic reasoning traces through low-strength model merging. This model excels at multi-page document reasoning and long-context visual document understanding in enterprise, legal, scientific, and financial domains, offering superior performance with 7x fewer parameters than larger alternatives.
Loading preview...
OriOn-Qwen Synthetic Reasoning 1: Long-Document VQA
LightOn's OriOn-Qwen-SR1 is a 33.4 billion parameter vision-language model based on Qwen3-VL-32B-Instruct, specifically engineered for advanced long-document visual question answering (VQA).
Key Capabilities
- SOTA Performance: Achieves 58.3 accuracy on MMLongBenchDoc, outperforming models like
Qwen3-VL-235B-A22B-Instruct(57.0) with significantly fewer parameters. - Internalized Reasoning: Utilizes a novel synthetic reasoning pipeline and low-strength model merging (α=0.25) to internalize reasoning traces. This means the model benefits from complex reasoning without explicitly generating thinking tokens, maintaining efficient inference.
- Controllable Reasoning: Reasoning capabilities can be activated at inference time by including the
<cot>control token in the system prompt, leading to a +3.8 MMLBD improvement. - Drop-in Replacement: Compatible with the
Qwen3VLForConditionalGenerationandAutoProcessorAPI, making it easy to integrate for users familiar with the Qwen3-VL family. - Long Context: Supports a context length of 262,144 tokens, enabling processing of extensive multi-page documents.
Good For
- Long PDF and Slide-Deck QA: Designed for question answering across documents up to 250+ pages.
- Multi-Page Document Reasoning: Excels in tasks requiring cross-page synthesis and understanding.
- Visual Document Understanding: Ideal for enterprise, legal, scientific, and financial applications involving long-context visual documents.