xFinder-qwen1505: Key Answer Extraction for LLM Evaluation

xFinder-qwen1505, developed by IAAR, is a 0.6 billion parameter model fine-tuned from Qwen-1.5-0.5B. Its core purpose is to accurately extract key answers from the outputs of large language models (LLMs). This model was created to overcome the limitations of traditional regular expression (RegEx)-based extraction methods, which often struggle with the diverse and complex nature of LLM-generated text.

Key Capabilities and Features

Enhanced LLM Evaluation: Improves the reliability and accuracy of LLM assessments by providing a robust method for extracting critical information from model responses.
Specialized Training: Fine-tuned on approximately 26.9K samples from the meticulously annotated Key Answer Finder (KAF) dataset, which includes diverse tasks and was annotated by GPT-4 and human experts.
Superior Extraction: Demonstrates significant improvements in extraction accuracy and robustness compared to conventional methods, as evaluated on human-annotated test and generalization sets of the KAF dataset.

When to Use xFinder-qwen1505

This model is ideal for researchers and developers who need to:

Automate LLM Evaluation: Reliably extract specific answers or data points from LLM outputs for automated performance assessment.
Improve Evaluation Accuracy: Move beyond brittle RegEx patterns for more nuanced and accurate information retrieval from complex text.
Analyze LLM Responses: Gain precise insights into what an LLM has generated by pinpointing key information.

For more technical details and to explore the code, refer to the xFinder GitHub repository and the associated research paper.