Name: opencompass/CompassJudger-1-32B-Instruct API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: opencompass

CompassJudger-1: All-in-one Judge Model

CompassJudger-1, developed by Opencompass, is a 32.8 billion parameter instruction-tuned model primarily designed for comprehensive evaluation tasks. It functions as an "all-in-one" judge, capable of performing various assessment methods and generating detailed reviews in specified formats.

Key Capabilities:

Comprehensive Evaluation: Supports multiple evaluation methods, including scoring, comparison, and detailed assessment feedback.
Formatted Output: Can output evaluation results in a specific, user-defined format for easier analysis.
Versatility: Beyond its judging capabilities, it can also handle general instruction-following tasks, similar to a standard instruction model.
Inference Acceleration: Supports acceleration methods like vLLM and LMdeploy for efficient model inference.

JudgerBench and Subjective Evaluation:

Opencompass has introduced JudgerBench, a new benchmark to standardize the evaluation of judging models. CompassJudger-1 is integral to this benchmark, helping to identify more effective evaluator models. Users can test their judge models on JudgerBench using provided OpenCompass scripts and contribute to the leaderboard.

Additionally, CompassJudger-1 can be used within the OpenCompass framework to evaluate common subjective datasets, such as Alignbench, by configuring it as a judge model for other LLMs. A separate subjective evaluation leaderboard showcases its performance in this area.

Good for:

Automated evaluation of LLM responses through scoring and comparison.
Generating structured, detailed feedback for model assessment.
General instruction-following tasks where a versatile model with strong judging capabilities is beneficial.
Developers looking to integrate a robust, format-aware judge into their evaluation pipelines.

Overview

CompassJudger-1: All-in-one Judge Model

Key Capabilities:

JudgerBench and Subjective Evaluation:

Good for:

Full Model Card (README)