PatronusAI/glider

Warm
Public
4B
BF16
4096
1
Dec 15, 2024
License: cc-by-nc-4.0
Hugging Face
Overview

Overview

Patronus GLIDER is a 4 billion parameter model developed by Patronus AI, fine-tuned from the microsoft/Phi-3.5-mini-instruct architecture. Its core purpose is to serve as a versatile evaluation model, capable of judging the quality and adherence to criteria of various text-based outputs, including conversations and Retrieval-Augmented Generation (RAG) systems.

Key Capabilities

  • General-Purpose Evaluation: GLIDER can assess texts, conversations, and RAG outputs against arbitrary, user-defined criteria and rubric scales.
  • Domain Adaptation: Trained on a combination of synthetic and domain-adapted data from datasets like Mocha, FinQA, and Realtoxicity, covering over 183 metrics and 685 domains (e.g., finance, medicine).
  • Multilingual Support: Primarily English, but also supports numerous other languages including Korean, Kazakh, Hindi, Bengali, Spanish, Indonesian, German, French, Arabic, Russian, Thai, Turkish, Ukrainian, and Romanian.
  • Extended Context: While the maximum sequence length is 8192 tokens, the model has been tested to support longer texts, up to 12,000 tokens.
  • Explainable Scoring: Designed to provide detailed reasoning, highlight important phrases, and assign an integer score based on a provided rubric.

Good For

  • Automated Content Moderation: Evaluating text against specific guidelines or toxicity metrics.
  • RAG System Assessment: Judging the relevance and accuracy of retrieved contexts and generated responses.
  • Conversational AI Quality Assurance: Scoring dialogue coherence, helpfulness, or adherence to persona.
  • Custom Evaluation Tasks: Users can define their own pass_criteria and rubric to tailor the model's evaluation to specific needs, making it highly adaptable for various quality control and assessment scenarios.