Name: Skywork/Skywork-Critic-Llama-3.1-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Skywork

Skywork-Critic-Llama-3.1-8B: An Advanced Judge Model

Skywork-Critic-Llama-3.1-8B, developed by the SkyworkAI Alignment Team, is an 8 billion parameter judge model built on Meta's Llama-3.1-8B-Instruct. It is specifically designed for pairwise preference evaluation, providing nuanced judgments on the quality and suitability of input pairs.

Key Capabilities & Training:

Pairwise Preference Evaluation: Compares and assesses two inputs, offering a verdict on which is superior.
Diverse Training Data: Fine-tuned on a high-quality mix of datasets including:
- Cleaned open-source data (HelpSteer2, OffsetBias, WildGuard, Magpie DPO series).
- Limited in-house human annotation data, primarily in Chinese, for pointwise scoring and pairwise comparisons.
- Synthetic critic data generated using methods similar to "self-taught" approaches, creating inferior responses by modifying instructions or introducing subtle errors.
- Critic-related chat data to maintain conversational abilities.
Instruction-Tuning Methodology: Employs instruction-tuning for both pairwise preference evaluation and general chat tasks.

Performance Highlights:

RewardBench Leaderboard: As of September 2024, Skywork-Critic-Llama-3.1-8B ranks first on RewardBench for generative models under 10 billion parameters, achieving an Overall Score of 89.0.

Ideal Use Cases:

Data Improvement: Identifying and refining high-quality data for further model training.
Evaluation: Objectively assessing the output quality of other AI models or systems.
Reward Modeling: Generating reward signals for reinforcement learning from human feedback (RLHF) processes.

This model is a robust tool for applications requiring precise and objective comparative analysis of AI-generated content.

Overview

Skywork-Critic-Llama-3.1-8B: An Advanced Judge Model

Key Capabilities & Training:

Performance Highlights:

Ideal Use Cases:

Full Model Card (README)