PatronusAI/Llama-3-Patronus-Lynx-70B-Instruct

Warm
Public
70B
FP8
8192
License: cc-by-nc-4.0
Hugging Face
Overview

Overview

PatronusAI/Llama-3-Patronus-Lynx-70B-Instruct is a 70 billion parameter model developed by Patronus AI, fine-tuned from meta-llama/Meta-Llama-3-70B-Instruct. Its primary purpose is to serve as an open-source hallucination evaluation model, specifically designed for RAG (Retrieval Augmented Generation) applications. The model was trained on a diverse mix of datasets, including CovidQA, PubmedQA, DROP, and RAGTruth, incorporating both hand-annotated and synthetic data, with a maximum sequence length of 8000 tokens.

Key Capabilities

  • Hallucination Detection: Evaluates whether an AI-generated answer is faithful to a provided document, ensuring no new information is introduced or contradictions occur.
  • Faithfulness Scoring: Outputs a clear "PASS" or "FAIL" verdict, along with detailed reasoning, in a structured JSON format.
  • High Performance: Achieves an overall score of 87.4% on the HaluBench evaluation, surpassing models like GPT-4o, GPT-4-Turbo, GPT-3.5-Turbo, and Claude-3-Sonnet in hallucination detection.

Good for

  • Developers building RAG systems who need to programmatically assess the factual consistency of generated responses against source documents.
  • Researchers and practitioners focused on improving the reliability and trustworthiness of large language models by identifying and mitigating hallucinations.
  • Applications requiring automated quality control for AI-generated content where factual accuracy is paramount.