Overview
Con-J-Qwen2-7B is a 7.6 billion parameter generative judge model developed by ZiyiYe. It is based on the Qwen2-7B-Instruct architecture and is specifically trained to act as a "generative judge" by evaluating two potential answers to a question and providing a rationale for its preference. The model learns to generate both positive and negative judgments, complete with natural language rationales, from preference data using Direct Preference Optimization (DPO).
Key Capabilities
- Generative Judgment: Evaluates two candidate answers to a question and determines which is superior.
- Rationale Generation: Provides detailed, natural language explanations for its judgments, enhancing transparency and interpretability.
- Preference-based Training: Utilizes self-generated contrastive judgment pairs from the Skywork/Skywork-Reward-Preference-80K-v0.1 dataset for robust training.
Performance Highlights
Con-J-Qwen2-7B demonstrates strong performance across various reward model benchmarks, often outperforming models in its size class and even larger models in specific categories. Notably, it achieves:
- 81.0 on Infinity-Preference, surpassing GPT-4o (75.0) and Llama3.1-70B (64.0).
- 73.0 on Ultra-Feedback, outperforming GPT-4o (72.2) and Llama3.1-70B (71.4).
- 79.6 on Reward-Bench Chat-H, significantly higher than GPT-4o (74.3) and Llama3.1-70B (70.2).
- 88.0 on Reward-Bench Safety, exceeding GPT-4o (87.6) and Llama3.1-70B (82.8).
Good For
- Automated evaluation of LLM outputs.
- Providing detailed feedback and rationales for answer quality.
- Developing systems that require nuanced judgment of text coherence, accuracy, and coverage.
- Research into generative judge models and preference-based learning.