CompassJudger-1: All-in-one Judge Model
CompassJudger-1, developed by Opencompass, is a 32.8 billion parameter instruction-tuned model primarily designed for comprehensive evaluation tasks. It functions as an "all-in-one" judge, capable of performing various assessment methods and generating detailed reviews in specified formats.
Key Capabilities:
- Comprehensive Evaluation: Supports multiple evaluation methods, including scoring, comparison, and detailed assessment feedback.
- Formatted Output: Can output evaluation results in a specific, user-defined format for easier analysis.
- Versatility: Beyond its judging capabilities, it can also handle general instruction-following tasks, similar to a standard instruction model.
- Inference Acceleration: Supports acceleration methods like vLLM and LMdeploy for efficient model inference.
JudgerBench and Subjective Evaluation:
Opencompass has introduced JudgerBench, a new benchmark to standardize the evaluation of judging models. CompassJudger-1 is integral to this benchmark, helping to identify more effective evaluator models. Users can test their judge models on JudgerBench using provided OpenCompass scripts and contribute to the leaderboard.
Additionally, CompassJudger-1 can be used within the OpenCompass framework to evaluate common subjective datasets, such as Alignbench, by configuring it as a judge model for other LLMs. A separate subjective evaluation leaderboard showcases its performance in this area.
Good for:
- Automated evaluation of LLM responses through scoring and comparison.
- Generating structured, detailed feedback for model assessment.
- General instruction-following tasks where a versatile model with strong judging capabilities is beneficial.
- Developers looking to integrate a robust, format-aware judge into their evaluation pipelines.