Name: ZiyiYe/Con-J-Qwen2-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ZiyiYe

Overview

Con-J-Qwen2-7B is a 7.6 billion parameter generative judge model developed by ZiyiYe. It is based on the Qwen2-7B-Instruct architecture and is specifically trained to act as a "generative judge" by evaluating two potential answers to a question and providing a rationale for its preference. The model learns to generate both positive and negative judgments, complete with natural language rationales, from preference data using Direct Preference Optimization (DPO).

Key Capabilities

Generative Judgment: Evaluates two candidate answers to a question and determines which is superior.
Rationale Generation: Provides detailed, natural language explanations for its judgments, enhancing transparency and interpretability.
Preference-based Training: Utilizes self-generated contrastive judgment pairs from the Skywork/Skywork-Reward-Preference-80K-v0.1 dataset for robust training.

Performance Highlights

Con-J-Qwen2-7B demonstrates strong performance across various reward model benchmarks, often outperforming models in its size class and even larger models in specific categories. Notably, it achieves:

81.0 on Infinity-Preference, surpassing GPT-4o (75.0) and Llama3.1-70B (64.0).
73.0 on Ultra-Feedback, outperforming GPT-4o (72.2) and Llama3.1-70B (71.4).
79.6 on Reward-Bench Chat-H, significantly higher than GPT-4o (74.3) and Llama3.1-70B (70.2).
88.0 on Reward-Bench Safety, exceeding GPT-4o (87.6) and Llama3.1-70B (82.8).

Good For

Automated evaluation of LLM outputs.
Providing detailed feedback and rationales for answer quality.
Developing systems that require nuanced judgment of text coherence, accuracy, and coverage.
Research into generative judge models and preference-based learning.

Overview

Overview

Key Capabilities

Performance Highlights

Good For

Full Model Card (README)