glab-caltech/VALOR-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Dec 11, 2025License:mitArchitecture:Transformer Open Weights Cold

VALOR-8B is an 8 billion parameter Qwen3-based model developed by glab-caltech, specifically fine-tuned using Reinforcement Learning (RL) for visual reasoning tasks. This model is designed to process and reason about multimodal inputs, as detailed in the paper "No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers." It excels in scenarios requiring visual understanding and logical inference without explicit labels.

Loading preview...

VALOR-8B: RL-Tuned Visual Reasoner

VALOR-8B is an 8 billion parameter language model built upon the Qwen3 architecture, developed by glab-caltech. Its core distinction lies in its training methodology: it has been fine-tuned using Reinforcement Learning (RL) specifically for visual reasoning tasks. This approach, detailed in the paper "No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers," enables the model to perform complex visual inferences.

Key Capabilities

  • Multimodal Reasoning: Designed to understand and reason across visual and textual inputs.
  • Reinforcement Learning Fine-tuning: Utilizes an RL-based training paradigm for enhanced reasoning abilities.
  • Label-Free Learning: Focuses on learning visual reasoning without relying on explicit labels, as highlighted in its foundational research.

Good For

  • Applications requiring advanced visual understanding and logical deduction.
  • Research into multimodal AI and reinforcement learning for reasoning tasks.
  • Scenarios where traditional label-dependent training is challenging or unavailable.

For more in-depth information, refer to the project webpage and the associated research paper.