CodeGoat24/UnifiedReward-Flex-qwen3vl-32b

VISIONConcurrency Cost:2Model Size:33.4BQuant:FP8Ctx Length:32kPublished:Feb 2, 2026License:mitArchitecture:Transformer Open Weights Cold

CodeGoat24/UnifiedReward-Flex-qwen33.4b is a 33.4 billion parameter unified personalized reward model for vision generation, developed by CodeGoat24. This model couples reward modeling with flexible and context-adaptive reasoning, making it suitable for tasks requiring nuanced evaluation of generated visual content. It is designed to provide personalized rewards, enhancing the quality and relevance of vision generation outputs.

Loading preview...

UnifiedReward-Flex-qwen3vl-32b Overview

CodeGoat24/UnifiedReward-Flex-qwen3vl-32b is a 33.4 billion parameter model specifically designed as a unified personalized reward model for vision generation. This model integrates reward modeling with a flexible, context-adaptive reasoning approach, aiming to provide more nuanced and personalized feedback for generated visual content.

Key Capabilities

  • Personalized Reward Modeling: Focuses on generating rewards tailored to specific contexts and user preferences in vision generation tasks.
  • Context-Adaptive Reasoning: Employs flexible reasoning to adapt its reward mechanisms based on the input context.
  • Vision Generation Enhancement: Designed to improve the quality and relevance of outputs from vision generation models by providing sophisticated reward signals.

Good For

  • Evaluating Generated Images: Ideal for scenarios where a nuanced, personalized assessment of generated visual content is required.
  • Reinforcement Learning from Human Feedback (RLHF) for Vision: Can be integrated into pipelines that use reward models to fine-tune vision generation models.
  • Research in Vision-Language Models: Useful for researchers exploring advanced reward mechanisms and personalized feedback in multimodal AI.

Further details, including the inference code, are available on the Github repository and the associated research paper.