CodeGoat24/UnifiedReward-2.0-qwen3vl-2b

VISIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Nov 8, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

CodeGoat24's UnifiedReward-2.0-qwen3vl-2b is a 2 billion parameter unified reward model based on Qwen3-VL-2B-Instruct, designed for multimodal understanding and generation assessment. It supports both pairwise ranking and pointwise scoring, enabling preference alignment for vision models. This model excels at evaluating content across image generation, image understanding, video generation, and video understanding tasks, offering a comprehensive solution for multimodal AI evaluation.

Loading preview...

UnifiedReward-2.0-qwen3vl-2b Overview

UnifiedReward-2.0-qwen3vl-2b is a 2 billion parameter reward model developed by CodeGoat24, built upon the Qwen3-VL-2B-Instruct architecture. Its primary innovation lies in its "unified" approach, capable of assessing both multimodal understanding and generation tasks. This model supports both pairwise ranking and pointwise scoring, making it versatile for various evaluation scenarios, particularly for aligning preferences in vision models.

Key Capabilities

  • Multimodal Assessment: Evaluates content across diverse modalities including image generation, image understanding, video generation, and video understanding.
  • Flexible Scoring: Supports both pairwise ranking (comparing two outputs) and pointwise scoring (assigning a score to a single output).
  • Vision Model Preference Alignment: Specifically designed to help align the outputs of vision models with human preferences.
  • Comprehensive Coverage: Unlike many existing reward models that specialize in one modality (e.g., only image generation or only video understanding), UnifiedReward-2.0 offers broad coverage across all four key multimodal areas.

What Makes It Different

This model distinguishes itself by being the first unified reward model to cover such a wide range of multimodal tasks. While other reward models like PickScore, HPS, and ImageReward focus solely on image generation, or LLaVA-Critic and IXC-2.5-Reward on image/video understanding, UnifiedReward-2.0 integrates all these capabilities. This makes it a more holistic solution for developers working with complex multimodal AI systems, reducing the need for multiple specialized reward models.

Good For

  • Developers needing a single reward model for diverse multimodal evaluation tasks.
  • Researchers and engineers working on preference alignment for vision models.
  • Evaluating and ranking outputs from image generation, image understanding, video generation, and video understanding models.