Model Overview
THU-KEG/IF-Verifier-7B is a 7.6 billion parameter generative reward model developed by Hao Peng@THUKEG. It is fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B and supports both English and Chinese languages. The primary purpose of this model is to verify soft constraints of instruction following in generated text.
Key Capabilities
- Instruction Following Verification: Specifically designed to evaluate the adherence to instructions, acting as a critic model.
- Efficiency: Can be deployed on a single H800 GPU, with an average reward computation time of 120 seconds per batch, which can be further optimized with multi-GPU setups.
- Performance: Achieves verification results comparable to larger models, specifically noted to be on par with QwQ 32B.
- Extensive Context: Features a substantial context length of 131072 tokens.
Training Details
The model was trained using 131,000 critic data points from the IF-Verifier-Data dataset. More detailed information, including the research paper, can be found in the VerIF GitHub repository.
Good For
- Developers and researchers focused on reinforcement learning from human feedback (RLHF) or similar alignment techniques.
- Applications requiring automated evaluation of instruction adherence in large language model outputs.
- Scenarios where efficient, GPU-friendly deployment of a reward model is crucial.