IAAR-Shanghai/xFinder-llama38it
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:May 20, 2024License:cc-by-nc-nd-4.0Architecture:Transformer0.0K Open Weights Cold
xFinder-llama38it is an 8 billion parameter model developed by IAAR, fine-tuned from Llama3-8B-Instruct with an 8192-token context length. It is specifically designed for accurate key answer extraction from large language model outputs, addressing the limitations of traditional RegEx methods. This model enhances the reliability of LLM evaluation across diverse tasks by improving extraction accuracy and robustness.
Loading preview...
xFinder-llama38it: Key Answer Extraction for LLM Evaluation
xFinder-llama38it, developed by IAAR, is an 8 billion parameter model fine-tuned from Llama3-8B-Instruct. Its primary purpose is to perform key answer extraction from the outputs of large language models (LLMs).
Key Capabilities and Features
- Enhanced Evaluation: Improves the reliability and accuracy of LLM assessments by precisely extracting key answers from complex and varied LLM generations.
- Overcomes RegEx Limitations: Addresses the shortcomings of traditional regular expression-based extraction methods, which often struggle with the diversity of LLM outputs.
- Specialized Training: Fine-tuned on approximately 26.9K samples from the Key Answer Finder (KAF) dataset, meticulously annotated by GPT-4 and human experts.
- Robust Performance: Demonstrates significant improvements in extraction accuracy and robustness, as evaluated on human-annotated test and generalization sets of the KAF dataset.
When to Use This Model
- Automated LLM Evaluation: Ideal for researchers and developers needing a more reliable and accurate method to evaluate LLM performance across various tasks.
- Complex Output Analysis: Suitable for scenarios where LLM outputs are diverse and require precise extraction of specific information, beyond what simple pattern matching can achieve.
- Research and Development: Useful for those exploring advanced methods for understanding and assessing the factual correctness or specific information retrieval capabilities of LLMs.