CodeGoat24/UnifiedReward-Think-qwen3vl-32b

VISIONConcurrency Cost:2Model Size:33.4BQuant:FP8Ctx Length:32kPublished:Nov 25, 2025License:mitArchitecture:Transformer Open Weights Cold

CodeGoat24/UnifiedReward-Think-qwen3vl-32b is a 33.4 billion parameter multimodal CoT reward model developed by CodeGoat24, based on the Qwen3VL architecture. It is designed for multi-dimensional, step-by-step long-chain reasoning, excelling in both visual understanding and generation reward tasks. This model's primary differentiator is its unified approach to multimodal chain-of-thought reasoning for reward modeling.

Loading preview...

UnifiedReward-Think-qwen3vl-32b: Multimodal CoT Reward Model

UnifiedReward-Think-qwen3vl-32b is a 33.4 billion parameter model developed by CodeGoat24, representing the first unified multimodal Chain-of-Thought (CoT) reward model. It is specifically engineered to perform multi-dimensional, step-by-step long-chain reasoning across various tasks.

Key Capabilities

  • Unified Multimodal Reasoning: Integrates both visual understanding and generation reward tasks within a single framework.
  • Chain-of-Thought (CoT) Reward Modeling: Utilizes a step-by-step reasoning process to evaluate and provide rewards, enhancing the interpretability and accuracy of feedback.
  • Long-Chain Reasoning: Capable of handling complex tasks that require extended sequences of logical steps.

Use Cases

This model is particularly well-suited for applications requiring advanced reward mechanisms in multimodal contexts, such as:

  • Evaluating the quality of visual content generation.
  • Assessing the coherence and correctness of multimodal outputs.
  • Providing detailed, step-by-step feedback for complex visual and language tasks.

For more in-depth technical details, including the underlying methodology and experimental results, refer to the associated paper and the project page.