Jiaqi-hkust/Robust-R1-SFT
VISIONConcurrency Cost:1Model Size:3BQuant:BF16Ctx Length:32kPublished:Nov 9, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold
Robust-R1-SFT is a 3 billion parameter vision-language model developed by Jiaqi-hkust, fine-tuned from Qwen2.5-VL-Base. It is specifically designed for robust visual understanding, incorporating degradation-aware reasoning. This model excels in scenarios requiring visual analysis under varying conditions, leveraging its training on the Robust-R1 dataset.
Loading preview...
Robust-R1-SFT: Degradation-Aware Visual Reasoning
Robust-R1-SFT is a 3 billion parameter vision-language model developed by Jiaqi-hkust, built upon the Qwen2.5-VL-Base architecture. This model is specifically fine-tuned for robust visual understanding by incorporating degradation-aware reasoning capabilities.
Key Capabilities
- Degradation-Aware Reasoning: Designed to perform robustly in visual tasks even when input images are degraded or imperfect.
- Visual-Language Integration: Combines visual perception with language understanding, enabling complex visual question answering and description tasks.
- Specialized Fine-tuning: Fine-tuned on the dedicated Robust-R1 dataset, which focuses on enhancing resilience to visual degradations.
When to Use
- Applications requiring reliable visual understanding in real-world conditions where image quality may vary.
- Research into robust AI systems and degradation-aware models.
- Tasks involving visual question answering or image captioning where robustness to noise or imperfections is critical.