opencompass/CompassVerifier-7B
CompassVerifier-7B is a 7.6 billion parameter verifier model developed by OpenCompass, built upon the Qwen series architecture. It is designed for accurate and robust evaluation and outcome reward for large language models, demonstrating multi-domain competency across math, knowledge, and diverse reasoning tasks. This model excels at processing various answer types, including multi-subproblems and formulas, while effectively identifying abnormal or invalid responses and maintaining robustness to different prompt styles. Its primary use case is to serve as a lightweight, unified verifier for LLM outputs, outperforming general-purpose models and other verifiers on the VerifierBench benchmark.
Loading preview...
CompassVerifier-7B: A Robust LLM Verifier
CompassVerifier-7B, developed by OpenCompass, is a 7.6 billion parameter model specifically designed as an accurate and robust verifier for large language model outputs. Built on the Qwen series architecture, it offers multi-domain competency across math, knowledge, and various reasoning tasks, capable of processing diverse answer types including multi-subproblems and formulas.
Key Capabilities
- Unified Verification: Acts as a lightweight, unified verifier for LLM outputs.
- Multi-Domain Competency: Excels in evaluating responses across math, knowledge, and general reasoning.
- Robustness: Effectively identifies abnormal, invalid, or long-reasoning responses and is robust to different prompt styles.
- Detailed Analysis: Supports Chain-of-Thought (COT) mode for increased judgment accuracy on complex problems.
- Reward Model: Demonstrates strong performance as a reward model in Reinforcement Learning (RL) for improving LLM reasoning capabilities, outperforming rule-based and other model-based verifiers.
Good for
- LLM Evaluation: Accurately assessing the correctness and quality of LLM-generated answers.
- Reinforcement Learning (RL): Serving as a reward model to fine-tune LLMs for improved reasoning and problem-solving.
- Quality Control: Identifying and filtering out invalid, incomplete, or low-quality responses from LLMs.
- Complex Problem Verification: Handling multi-subproblem answers, mathematical formulas, and sequence answers with high precision.