grounded-ai/phi3-hallucination-judge-merge
The grounded-ai/phi3-hallucination-judge-merge is a 4 billion parameter PEFT adapter model designed for detecting hallucinations in language model outputs. This model specializes in binary classification to identify when an LLM response is coherent but factually incorrect or nonsensical relative to a given context. It achieves a high F1 score of 0.81 on hallucination detection benchmarks, outperforming several larger models in this specific task. Its primary use case is to serve as a judge for evaluating the factual grounding of other LLMs.
Loading preview...
Model Overview
The grounded-ai/phi3-hallucination-judge-merge is a 4 billion parameter PEFT (Parameter-Efficient Fine-Tuning) adapter model specifically developed for hallucination detection in large language model (LLM) outputs. It functions as a binary classifier, determining if an LLM's response is a hallucination—defined as a coherent but factually incorrect or nonsensical output not grounded in the provided context.
Key Capabilities
- Hallucination Detection: Excels at identifying factually incorrect or ungrounded responses from other LLMs.
- High F1 Score: Achieves an F1 score of 0.81 on its hallucination detection benchmark, demonstrating strong performance in balancing precision and recall.
- Comparative Performance: Outperforms several larger and well-known models like GPT-3.5, Gemini Pro, and Palm 2 (Text Bison) in this specialized task, and matches GPT-4 Turbo's F1 score.
- Efficient Evaluation: Designed to be integrated into evaluation pipelines to automatically assess the factual accuracy of LLM generations.
Recommended Use Cases
- LLM Evaluation: Ideal for developers and researchers needing to automatically score the factual consistency of their LLM outputs against a given reference and query.
- Quality Assurance: Can be used in production systems to flag potentially hallucinated content generated by LLMs before it reaches end-users.
- Research: Provides a robust tool for studying and mitigating hallucination phenomena in language models.
Training Details
The model was trained with a learning rate of 0.0001, a batch size of 2 (total batch size of 8 with accumulation), and 150 training steps. It leverages PEFT, Transformers, and PyTorch frameworks.