opencompass/CompassJudger-1-7B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Sep 29, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The CompassJudger-1-7B-Instruct is a 7.6 billion parameter instruction-tuned judge model developed by Opencompass, featuring a 131072-token context length. This versatile model excels at comprehensive AI model evaluation, including scoring, comparison, and detailed assessment feedback in specified formats. It also functions as a general-purpose instruction model, making it suitable for both advanced evaluation tasks and daily conversational use.

Loading preview...

CompassJudger-1-7B-Instruct: All-in-one Judge Model

Developed by Opencompass, the CompassJudger-1 series are specialized judge models designed for comprehensive AI model evaluation. This 7.6 billion parameter instruction-tuned model, with a 131072-token context length, offers robust capabilities for assessing other language models.

Key Capabilities

  • Comprehensive Evaluation: Supports various evaluation methods, including scoring, comparison, and generating detailed assessment reviews.
  • Formatted Output: Can output evaluation results and feedback in specific, structured formats, facilitating further analysis.
  • Versatility: Beyond its primary role as an evaluator, it also functions effectively as a general instruction-following model for diverse tasks.
  • Inference Acceleration: Supports model inference acceleration methods like vLLM and LMdeploy.

JudgerBench and Subjective Evaluation

Opencompass has established JudgerBench, a benchmark for standardizing the evaluation of judge models. CompassJudger-1 is integral to this, and its performance on JudgerBench is tracked on a dedicated leaderboard. The model can also be used within the OpenCompass framework to evaluate subjective datasets, providing a robust tool for assessing model performance in nuanced scenarios. A separate leaderboard showcases subjective evaluation results powered by CompassJudger-1.