FudanCVL/Unison-Judge
FudanCVL/Unison-Judge is an 8 billion parameter vision-language model, fine-tuned from Qwen3-VL-8B, designed to function as a local automatic judge for the Unison benchmark. It evaluates Unified Multimodal Models' (UMMs) outputs across four unified tasks: Image Captioning (IC), Unconditional Image Generation (UGG), Guided Image Generation (GGU), and Multimodal Editing (ME). This model provides API-free scoring, making it suitable for consistent and automated evaluation of multimodal model performance.
Loading preview...
Unison-Judge: A Local Automatic Vision-Language Judge
Unison-Judge is an 8 billion parameter vision-language model, fine-tuned from Qwen3-VL-8B by FudanCVL. Its primary purpose is to serve as a local, API-free automatic judge for the Unison benchmark, enabling consistent evaluation of Unified Multimodal Models (UMMs).
Key Capabilities
- Automatic Scoring: Evaluates UMM outputs across four distinct unified tasks:
- IC (Image Captioning)
- UGG (Unconditional Image Generation)
- GGU (Guided Image Generation)
- ME (Multimodal Editing)
- Local Operation: Does not require a hosted API, facilitating private and efficient evaluations.
- Consistency Data: The model's consistency is assessed using 231 evaluation cases across all four tasks, covering various question types and UMMs like BAGEL-7B-MoT, OmniGen2, SEED-X-17B, and UniWorld-V1.
Use Cases
This model is ideal for researchers and developers who need to automatically and consistently score the outputs of multimodal models, particularly within the context of the Unison benchmark. Its local operation makes it suitable for environments where external API calls are not feasible or desired.