CompassJudger-1-7B-Instruct: All-in-one Judge Model
Developed by Opencompass, the CompassJudger-1 series are specialized judge models designed for comprehensive AI model evaluation. This 7.6 billion parameter instruction-tuned model, with a 131072-token context length, offers robust capabilities for assessing other language models.
Key Capabilities
- Comprehensive Evaluation: Supports various evaluation methods, including scoring, comparison, and generating detailed assessment reviews.
- Formatted Output: Can output evaluation results and feedback in specific, structured formats, facilitating further analysis.
- Versatility: Beyond its primary role as an evaluator, it also functions effectively as a general instruction-following model for diverse tasks.
- Inference Acceleration: Supports model inference acceleration methods like vLLM and LMdeploy.
JudgerBench and Subjective Evaluation
Opencompass has established JudgerBench, a benchmark for standardizing the evaluation of judge models. CompassJudger-1 is integral to this, and its performance on JudgerBench is tracked on a dedicated leaderboard. The model can also be used within the OpenCompass framework to evaluate subjective datasets, providing a robust tool for assessing model performance in nuanced scenarios. A separate leaderboard showcases subjective evaluation results powered by CompassJudger-1.