Model Overview
Salesforce/FARE-8B is an 8 billion parameter multi-task generative evaluator model, developed by Austin Xu, Xuan-Phi Nguyen, Yilun Zhou, Chien-Sheng Wu, Caiming Xiong, and Shafiq Joty. It is fine-tuned from the Qwen-8B base model and designed for automated evaluation in reasoning-centric domains. The model leverages large-scale multi-task, multi-domain data mixtures and rejection-sampling SFT to achieve its specialized evaluation capabilities.
Key Capabilities
- Multi-task Evaluation: Performs a range of evaluation tasks, including pairwise comparisons, step-level error identification, reference-based verification, reference-free verification, and single-rating assessment.
- Reasoning-Centric: Optimized for evaluating responses in domains requiring strong reasoning, as detailed in its accompanying paper.
- Prompt-Template Driven: Designed to be used with specific system and user prompt templates for each evaluation task, ensuring consistent and accurate assessments.
- High Context Length: Supports a context length of 32768 tokens, allowing for evaluation of longer responses or complex interactions.
Good For
- Automated AI Assistant Evaluation: Ideal for developers and researchers looking to automatically assess the quality and correctness of AI model outputs.
- Comparative Analysis: Excels at pairwise comparisons to determine which of two AI responses is superior.
- Error Identification: Capable of pinpointing specific errors at the step-level within a multi-step solution, particularly useful for mathematical or logical reasoning tasks.
- Research in Generative Evaluators: Serves as a foundational model for further research into scaling multi-task generative evaluator training.