IAAR-Shanghai/xVerify-7B-I

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Dec 3, 2025License:cc-by-nc-nd-4.0Architecture:Transformer Open Weights Cold

IAAR-Shanghai/xVerify-7B-I is a 7.6 billion parameter evaluation tool fine-tuned from a pre-trained large language model, designed to verify answers for objective questions with a single correct answer. Developed by IAAR-Shanghai, it excels at extracting final answers from complex reasoning processes and judging equivalence across various expression formats. This model is particularly suited for evaluating tasks like math problems, multiple-choice questions, and classification, supporting both Chinese and English responses.

Loading preview...

xVerify-7B-I: An Efficient Answer Verifier

xVerify-7B-I is a 7.6 billion parameter model developed by IAAR-Shanghai, specifically fine-tuned as an evaluation tool for objective questions. Presented in the paper "xVerify: Efficient Answer Verifier for Reasoning Model Evaluations" (arXiv:2504.10481), its primary function is to accurately extract final answers from reasoning processes and efficiently determine equivalence across different forms of expressions.

Key Capabilities

  • Broad Applicability: Suitable for diverse objective question evaluation scenarios, including mathematical problems, multiple-choice questions, classification tasks, and short-answer questions.
  • Handles Long Reasoning Chains: Capable of processing extensive reasoning steps to extract the final answer, regardless of the complexity of the intermediate steps.
  • Multilingual Support: Primarily supports Chinese and English responses, with compatibility for other languages.
  • Powerful Equivalence Judgment: Features robust capabilities for recognizing equivalence, including:
    • Basic transformations (e.g., letter case, Greek letter conversions).
    • Equivalent mathematical expressions (e.g., LaTeX, fractions, scientific notation).
    • Semantic equivalence in natural language answers.
    • Matching multiple-choice responses by content rather than just option identifiers.

Good For

  • Automated evaluation of LLM outputs on objective tasks.
  • Verifying correctness in educational or assessment systems.
  • Applications requiring precise answer extraction and equivalence checking from complex text.