Skywork/Skywork-Critic-Llama-3.1-8B
Skywork-Critic-Llama-3.1-8B is an 8 billion parameter judge model developed by the SkyworkAI Alignment Team, built upon Meta's Llama-3.1-8B-Instruct architecture. This model is specifically fine-tuned for advanced pairwise preference evaluation, offering nuanced judgments on the relative quality or suitability of input pairs. It excels at tasks requiring critical assessment, making it highly effective for data improvement, evaluation, and reward modeling applications.
Loading preview...
Skywork-Critic-Llama-3.1-8B: An Advanced Judge Model
Skywork-Critic-Llama-3.1-8B, developed by the SkyworkAI Alignment Team, is an 8 billion parameter judge model built on Meta's Llama-3.1-8B-Instruct. It is specifically designed for pairwise preference evaluation, providing nuanced judgments on the quality and suitability of input pairs.
Key Capabilities & Training:
- Pairwise Preference Evaluation: Compares and assesses two inputs, offering a verdict on which is superior.
- Diverse Training Data: Fine-tuned on a high-quality mix of datasets including:
- Cleaned open-source data (HelpSteer2, OffsetBias, WildGuard, Magpie DPO series).
- Limited in-house human annotation data, primarily in Chinese, for pointwise scoring and pairwise comparisons.
- Synthetic critic data generated using methods similar to "self-taught" approaches, creating inferior responses by modifying instructions or introducing subtle errors.
- Critic-related chat data to maintain conversational abilities.
- Instruction-Tuning Methodology: Employs instruction-tuning for both pairwise preference evaluation and general chat tasks.
Performance Highlights:
- RewardBench Leaderboard: As of September 2024, Skywork-Critic-Llama-3.1-8B ranks first on RewardBench for generative models under 10 billion parameters, achieving an Overall Score of 89.0.
Ideal Use Cases:
- Data Improvement: Identifying and refining high-quality data for further model training.
- Evaluation: Objectively assessing the output quality of other AI models or systems.
- Reward Modeling: Generating reward signals for reinforcement learning from human feedback (RLHF) processes.
This model is a robust tool for applications requiring precise and objective comparative analysis of AI-generated content.