CodeGoat24/UnifiedReward-2.0-qwen3vl-32b
UnifiedReward-2.0-qwen3vl-32b is a 33.4 billion parameter unified reward model developed by CodeGoat24, based on the Qwen3-VL-32B-Instruct architecture. This model is designed for multimodal understanding and generation assessment, supporting both pairwise ranking and pointwise scoring. It is specifically engineered for vision model preference alignment, capable of evaluating image and video generation and understanding tasks.
Loading preview...
UnifiedReward-2.0-qwen3vl-32b Overview
UnifiedReward-2.0-qwen3vl-32b is a 33.4 billion parameter reward model from CodeGoat24, built upon the Qwen3-VL-32B-Instruct foundation. It represents the first unified reward model capable of assessing multimodal understanding and generation. This model uniquely supports both pairwise ranking and pointwise scoring, making it versatile for various evaluation tasks.
Key Capabilities
- Multimodal Assessment: Evaluates both image and video generation and understanding.
- Unified Approach: Combines pairwise ranking and pointwise scoring within a single framework.
- Vision Model Alignment: Specifically designed to align preferences for vision models.
- Broad Coverage: Unlike many existing reward models, UnifiedReward covers image generation, image understanding, video generation, and video understanding, offering a comprehensive evaluation solution.
Differentiators
This model stands out by providing a unified solution across all major multimodal assessment categories, a feature not commonly found in other specialized reward models like PickScore (image generation), LLaVA-Critic (image understanding), or VideoScore (video generation). Its ability to handle both ranking and scoring for diverse visual tasks positions it as a robust tool for developing and refining multimodal AI systems.