PatronusAI/glider
PatronusAI/glider is a 4 billion parameter language model fine-tuned from Microsoft's Phi-3.5-mini-instruct, developed by Patronus AI. This model is specifically designed for general-purpose evaluation, capable of judging texts, conversations, and RAG setups based on user-defined criteria and rubrics. It was trained on a diverse dataset covering over 183 metrics and 685 domains, including finance and medicine, and supports a maximum sequence length of 8192 tokens, with tested support up to 12,000 tokens. Its primary strength lies in providing detailed, explainable evaluations for various AI outputs.
Loading preview...
Overview
Patronus GLIDER is a 4 billion parameter model developed by Patronus AI, fine-tuned from the microsoft/Phi-3.5-mini-instruct architecture. Its core purpose is to serve as a versatile evaluation model, capable of judging the quality and adherence to criteria of various text-based outputs, including conversations and Retrieval-Augmented Generation (RAG) systems.
Key Capabilities
- General-Purpose Evaluation: GLIDER can assess texts, conversations, and RAG outputs against arbitrary, user-defined criteria and rubric scales.
- Domain Adaptation: Trained on a combination of synthetic and domain-adapted data from datasets like Mocha, FinQA, and Realtoxicity, covering over 183 metrics and 685 domains (e.g., finance, medicine).
- Multilingual Support: Primarily English, but also supports numerous other languages including Korean, Kazakh, Hindi, Bengali, Spanish, Indonesian, German, French, Arabic, Russian, Thai, Turkish, Ukrainian, and Romanian.
- Extended Context: While the maximum sequence length is 8192 tokens, the model has been tested to support longer texts, up to 12,000 tokens.
- Explainable Scoring: Designed to provide detailed reasoning, highlight important phrases, and assign an integer score based on a provided rubric.
Good For
- Automated Content Moderation: Evaluating text against specific guidelines or toxicity metrics.
- RAG System Assessment: Judging the relevance and accuracy of retrieved contexts and generated responses.
- Conversational AI Quality Assurance: Scoring dialogue coherence, helpfulness, or adherence to persona.
- Custom Evaluation Tasks: Users can define their own
pass_criteriaandrubricto tailor the model's evaluation to specific needs, making it highly adaptable for various quality control and assessment scenarios.