Name: opencompass/CompassJudger-1-7B-Instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: opencompass

CompassJudger-1-7B-Instruct: All-in-one Judge Model

Developed by Opencompass, the CompassJudger-1 series are specialized judge models designed for comprehensive AI model evaluation. This 7.6 billion parameter instruction-tuned model, with a 131072-token context length, offers robust capabilities for assessing other language models.

Key Capabilities

Comprehensive Evaluation: Supports various evaluation methods, including scoring, comparison, and generating detailed assessment reviews.
Formatted Output: Can output evaluation results and feedback in specific, structured formats, facilitating further analysis.
Versatility: Beyond its primary role as an evaluator, it also functions effectively as a general instruction-following model for diverse tasks.
Inference Acceleration: Supports model inference acceleration methods like vLLM and LMdeploy.

JudgerBench and Subjective Evaluation

Opencompass has established JudgerBench, a benchmark for standardizing the evaluation of judge models. CompassJudger-1 is integral to this, and its performance on JudgerBench is tracked on a dedicated leaderboard. The model can also be used within the OpenCompass framework to evaluate subjective datasets, providing a robust tool for assessing model performance in nuanced scenarios. A separate leaderboard showcases subjective evaluation results powered by CompassJudger-1.

Overview

CompassJudger-1-7B-Instruct: All-in-one Judge Model

Key Capabilities

JudgerBench and Subjective Evaluation

Full Model Card (README)