Name: grounded-ai/phi3-hallucination-judge-merge API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: grounded-ai

Model Overview

The grounded-ai/phi3-hallucination-judge-merge is a 4 billion parameter PEFT (Parameter-Efficient Fine-Tuning) adapter model specifically developed for hallucination detection in large language model (LLM) outputs. It functions as a binary classifier, determining if an LLM's response is a hallucination—defined as a coherent but factually incorrect or nonsensical output not grounded in the provided context.

Key Capabilities

Hallucination Detection: Excels at identifying factually incorrect or ungrounded responses from other LLMs.
High F1 Score: Achieves an F1 score of 0.81 on its hallucination detection benchmark, demonstrating strong performance in balancing precision and recall.
Comparative Performance: Outperforms several larger and well-known models like GPT-3.5, Gemini Pro, and Palm 2 (Text Bison) in this specialized task, and matches GPT-4 Turbo's F1 score.
Efficient Evaluation: Designed to be integrated into evaluation pipelines to automatically assess the factual accuracy of LLM generations.

Recommended Use Cases

LLM Evaluation: Ideal for developers and researchers needing to automatically score the factual consistency of their LLM outputs against a given reference and query.
Quality Assurance: Can be used in production systems to flag potentially hallucinated content generated by LLMs before it reaches end-users.
Research: Provides a robust tool for studying and mitigating hallucination phenomena in language models.

Training Details

The model was trained with a learning rate of 0.0001, a batch size of 2 (total batch size of 8 with accumulation), and 150 training steps. It leverages PEFT, Transformers, and PyTorch frameworks.

Overview

Model Overview

Key Capabilities

Recommended Use Cases

Training Details

Full Model Card (README)