Overview
What is CompassVerifier-3B?
CompassVerifier-3B, developed by OpenCompass, is a 3.1 billion parameter lightweight verifier model built upon the Qwen series architecture. Its primary function is the accurate and robust evaluation of Large Language Model (LLM) outputs and serving as an outcome reward model.
Key Capabilities
- Multi-domain Competency: Excels across mathematical, knowledge-based, and diverse reasoning tasks.
- Versatile Answer Processing: Capable of handling various answer types, including multi-subproblems, formulas, and sequence answers.
- Robust Error Identification: Effectively identifies abnormal, invalid, or overly long-reasoning responses.
- Prompt Style Robustness: Maintains performance across different prompt styles.
- Reinforcement Learning Integration: Proven effective as a reward model in RL frameworks, significantly improving the reasoning capabilities of base models.
Performance Highlights
On the newly released VerifierBench benchmark, CompassVerifier-3B achieves an average F1 score of 80.4%, outperforming larger general models and other verifier models in its class. It also demonstrates strong robustness to varying prompt styles, achieving 87.4% accuracy and 77.4% F1 on model-specific prompts on VerifyBench.
Should you use this for your use case?
- LLM Evaluation: Ideal for developers needing an accurate and lightweight model to verify the correctness and quality of LLM-generated responses across various domains.
- RL Fine-tuning: Highly suitable for use as a reward model in reinforcement learning setups to enhance the reasoning capabilities of other LLMs.
- Quality Control: Useful for applications requiring automated identification of problematic or incorrect LLM outputs.