oumi-ai/HallOumi-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 11, 2025License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Warm

HallOumi-8B by oumi-ai is an 8 billion parameter hallucination detection model, fine-tuned from Llama-3.1-8B-Instruct, with a 32768 token context length. It specializes in per-sentence verification of content, providing support determinations, confidence scores, relevant citations, and human-readable explanations. This model is optimized for building trust in AI systems by verifying outputs against known contexts, outperforming larger models like DeepSeek R1 and Claude Sonnet 3.5 in F1 score for hallucination detection.

Loading preview...

HallOumi-8B: A Specialized Hallucination Detection Model

HallOumi-8B, developed by Oumi AI, is an 8 billion parameter model designed for state-of-the-art hallucination detection. It significantly outperforms larger and closed-source models such as DeepSeek R1 (671B), Claude Sonnet 3.5, OpenAI o1, and Google Gemini 1.5 Pro, achieving a Macro F1 Score of 77.2% ± 2.2%.

Key Capabilities

  • Per-sentence verification: Analyzes content (AI or human-generated) at a granular sentence level.
  • Contextual support determination: Identifies whether a statement is supported or unsupported by provided context, along with a confidence score.
  • Sentence-level citations: Provides relevant context sentences to justify its determination.
  • Human-readable explanations: Offers explanations for why a claim is supported or unsupported, aiding human review.
  • Trust building: Aims to address the critical issue of AI hallucinations by enabling verifiable and traceable outputs.

Use Cases

  • Claim verification: Ideal for scenarios where a known source of truth is available to verify claims.
  • Enhancing AI trust: Helps in safely and responsibly deploying generative models by ensuring output veracity.
  • Mitigating risks: Addresses issues like AI-generated misinformation in legal, customer service, and general information contexts.

This model is fine-tuned from Llama-3.1-8B-Instruct and utilizes a combination of synthetic and ANLI/C2D-D2C subsets for training. It is intended for claim verification and should not be used for purposes outside of this scope due to the inherent limitations of smaller LLMs.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p