Overview
Overview
Patronus GLIDER is a 4 billion parameter model developed by Patronus AI, fine-tuned from the microsoft/Phi-3.5-mini-instruct architecture. Its core purpose is to serve as a versatile evaluation model, capable of judging the quality and adherence to criteria of various text-based outputs, including conversations and Retrieval-Augmented Generation (RAG) systems.
Key Capabilities
- General-Purpose Evaluation: GLIDER can assess texts, conversations, and RAG outputs against arbitrary, user-defined criteria and rubric scales.
- Domain Adaptation: Trained on a combination of synthetic and domain-adapted data from datasets like Mocha, FinQA, and Realtoxicity, covering over 183 metrics and 685 domains (e.g., finance, medicine).
- Multilingual Support: Primarily English, but also supports numerous other languages including Korean, Kazakh, Hindi, Bengali, Spanish, Indonesian, German, French, Arabic, Russian, Thai, Turkish, Ukrainian, and Romanian.
- Extended Context: While the maximum sequence length is 8192 tokens, the model has been tested to support longer texts, up to 12,000 tokens.
- Explainable Scoring: Designed to provide detailed reasoning, highlight important phrases, and assign an integer score based on a provided rubric.
Good For
- Automated Content Moderation: Evaluating text against specific guidelines or toxicity metrics.
- RAG System Assessment: Judging the relevance and accuracy of retrieved contexts and generated responses.
- Conversational AI Quality Assurance: Scoring dialogue coherence, helpfulness, or adherence to persona.
- Custom Evaluation Tasks: Users can define their own
pass_criteriaandrubricto tailor the model's evaluation to specific needs, making it highly adaptable for various quality control and assessment scenarios.