THU-KEG/IF-Verifier-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jun 5, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

THU-KEG/IF-Verifier-7B is a 7.6 billion parameter generative reward model developed by Hao Peng@THUKEG, fine-tuned from DeepSeek-R1-Distill-Qwen-7B. This model is designed for verifying soft constraints in instruction following, supporting both English and Chinese. With a context length of 131072 tokens, it specializes in evaluating how well instructions are followed, offering performance comparable to larger models like QwQ 32B.

Loading preview...

Model Overview

THU-KEG/IF-Verifier-7B is a 7.6 billion parameter generative reward model developed by Hao Peng@THUKEG. It is fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B and supports both English and Chinese languages. The primary purpose of this model is to verify soft constraints of instruction following in generated text.

Key Capabilities

  • Instruction Following Verification: Specifically designed to evaluate the adherence to instructions, acting as a critic model.
  • Efficiency: Can be deployed on a single H800 GPU, with an average reward computation time of 120 seconds per batch, which can be further optimized with multi-GPU setups.
  • Performance: Achieves verification results comparable to larger models, specifically noted to be on par with QwQ 32B.
  • Extensive Context: Features a substantial context length of 131072 tokens.

Training Details

The model was trained using 131,000 critic data points from the IF-Verifier-Data dataset. More detailed information, including the research paper, can be found in the VerIF GitHub repository.

Good For

  • Developers and researchers focused on reinforcement learning from human feedback (RLHF) or similar alignment techniques.
  • Applications requiring automated evaluation of instruction adherence in large language model outputs.
  • Scenarios where efficient, GPU-friendly deployment of a reward model is crucial.