PatronusAI/Llama-3-Patronus-Lynx-70B-Instruct is a 70 billion parameter Llama-3-Instruct fine-tuned model developed by Patronus AI. This model is specifically designed and optimized for evaluating hallucination in RAG (Retrieval Augmented Generation) settings. It excels at determining the faithfulness of an answer to a given document, outperforming several larger commercial models in hallucination detection benchmarks.
Loading preview...
Overview
PatronusAI/Llama-3-Patronus-Lynx-70B-Instruct is a 70 billion parameter model developed by Patronus AI, fine-tuned from meta-llama/Meta-Llama-3-70B-Instruct. Its primary purpose is to serve as an open-source hallucination evaluation model, specifically designed for RAG (Retrieval Augmented Generation) applications. The model was trained on a diverse mix of datasets, including CovidQA, PubmedQA, DROP, and RAGTruth, incorporating both hand-annotated and synthetic data, with a maximum sequence length of 8000 tokens.
Key Capabilities
- Hallucination Detection: Evaluates whether an AI-generated answer is faithful to a provided document, ensuring no new information is introduced or contradictions occur.
- Faithfulness Scoring: Outputs a clear "PASS" or "FAIL" verdict, along with detailed reasoning, in a structured JSON format.
- High Performance: Achieves an overall score of 87.4% on the HaluBench evaluation, surpassing models like GPT-4o, GPT-4-Turbo, GPT-3.5-Turbo, and Claude-3-Sonnet in hallucination detection.
Good for
- Developers building RAG systems who need to programmatically assess the factual consistency of generated responses against source documents.
- Researchers and practitioners focused on improving the reliability and trustworthiness of large language models by identifying and mitigating hallucinations.
- Applications requiring automated quality control for AI-generated content where factual accuracy is paramount.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.