opencompass/CompassVerifier-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jul 9, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

CompassVerifier-7B is a 7.6 billion parameter verifier model developed by OpenCompass, built upon the Qwen series architecture. It is designed for accurate and robust evaluation and outcome reward for large language models, demonstrating multi-domain competency across math, knowledge, and diverse reasoning tasks. This model excels at processing various answer types, including multi-subproblems and formulas, while effectively identifying abnormal or invalid responses and maintaining robustness to different prompt styles. Its primary use case is to serve as a lightweight, unified verifier for LLM outputs, outperforming general-purpose models and other verifiers on the VerifierBench benchmark.

Loading preview...

CompassVerifier-7B: A Robust LLM Verifier

CompassVerifier-7B, developed by OpenCompass, is a 7.6 billion parameter model specifically designed as an accurate and robust verifier for large language model outputs. Built on the Qwen series architecture, it offers multi-domain competency across math, knowledge, and various reasoning tasks, capable of processing diverse answer types including multi-subproblems and formulas.

Key Capabilities

  • Unified Verification: Acts as a lightweight, unified verifier for LLM outputs.
  • Multi-Domain Competency: Excels in evaluating responses across math, knowledge, and general reasoning.
  • Robustness: Effectively identifies abnormal, invalid, or long-reasoning responses and is robust to different prompt styles.
  • Detailed Analysis: Supports Chain-of-Thought (COT) mode for increased judgment accuracy on complex problems.
  • Reward Model: Demonstrates strong performance as a reward model in Reinforcement Learning (RL) for improving LLM reasoning capabilities, outperforming rule-based and other model-based verifiers.

Good for

  • LLM Evaluation: Accurately assessing the correctness and quality of LLM-generated answers.
  • Reinforcement Learning (RL): Serving as a reward model to fine-tune LLMs for improved reasoning and problem-solving.
  • Quality Control: Identifying and filtering out invalid, incomplete, or low-quality responses from LLMs.
  • Complex Problem Verification: Handling multi-subproblem answers, mathematical formulas, and sequence answers with high precision.