opencompass/CompassJudger-1-14B-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Oct 16, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The CompassJudger-1 series, developed by Opencompass, are all-in-one judge models designed for comprehensive AI model evaluation. These models excel at various evaluation methods, including scoring, comparison, and generating detailed assessment reviews in specified formats. Beyond their judging capabilities, they also function as versatile instruction models for general tasks, supporting inference acceleration methods like vLLM and LMdeploy.

Loading preview...

Overview

opencompass/CompassJudger-1-14B-Instruct is part of the CompassJudger-1 series by Opencompass, designed as an all-in-one judge model for comprehensive AI model evaluation. It stands out for its ability to perform multiple evaluation methods, including scoring, comparison, and generating detailed assessment feedback with formatted output. This model is not only specialized for judging but also functions as a general instruction model, making it a versatile tool with strong generalization capabilities.

Key Capabilities

  • Comprehensive Evaluation: Supports diverse evaluation methods such as scoring, comparison, and detailed review generation.
  • Formatted Output: Can output assessment details in a specified format, aiding further analysis of evaluation results.
  • Versatility: Functions as a universal instruction model for general tasks in addition to its primary evaluation role.
  • Inference Acceleration: Compatible with acceleration methods like vLLM and LMdeploy for efficient deployment.
  • JudgerBench: Opencompass has established a new benchmark, JudgerBench, to standardize the evaluation of judging models, with CompassJudger-1 being a key participant.

Good For

  • AI Model Evaluation: Ideal for developers and researchers needing to rigorously evaluate other AI models through scoring, comparison, or detailed qualitative feedback.
  • Automated Assessment: Useful for automating the generation of structured reviews and assessments of model outputs.
  • General Instruction Following: Can be employed for typical instruction-tuned model tasks, offering flexibility beyond its judging specialization.
  • Subjective Dataset Evaluation: Integrates with OpenCompass for evaluating subjective datasets, providing a robust framework for assessing model performance on complex tasks.