Salesforce/FARE-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 21, 2025License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

Salesforce/FARE-8B is an 8 billion parameter multi-task generative evaluator model, fine-tuned from Qwen-8B with a 32768 token context length. Developed by Salesforce, it specializes in various evaluation tasks including pairwise comparisons, step-level evaluation, and both reference-based and reference-free verification. This model is optimized for reasoning-centric domains, making it suitable for automated assessment of AI assistant responses.

Loading preview...

Model Overview

Salesforce/FARE-8B is an 8 billion parameter multi-task generative evaluator model, developed by Austin Xu, Xuan-Phi Nguyen, Yilun Zhou, Chien-Sheng Wu, Caiming Xiong, and Shafiq Joty. It is fine-tuned from the Qwen-8B base model and designed for automated evaluation in reasoning-centric domains. The model leverages large-scale multi-task, multi-domain data mixtures and rejection-sampling SFT to achieve its specialized evaluation capabilities.

Key Capabilities

  • Multi-task Evaluation: Performs a range of evaluation tasks, including pairwise comparisons, step-level error identification, reference-based verification, reference-free verification, and single-rating assessment.
  • Reasoning-Centric: Optimized for evaluating responses in domains requiring strong reasoning, as detailed in its accompanying paper.
  • Prompt-Template Driven: Designed to be used with specific system and user prompt templates for each evaluation task, ensuring consistent and accurate assessments.
  • High Context Length: Supports a context length of 32768 tokens, allowing for evaluation of longer responses or complex interactions.

Good For

  • Automated AI Assistant Evaluation: Ideal for developers and researchers looking to automatically assess the quality and correctness of AI model outputs.
  • Comparative Analysis: Excels at pairwise comparisons to determine which of two AI responses is superior.
  • Error Identification: Capable of pinpointing specific errors at the step-level within a multi-step solution, particularly useful for mathematical or logical reasoning tasks.
  • Research in Generative Evaluators: Serves as a foundational model for further research into scaling multi-task generative evaluator training.