future7/CogniDet

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kArchitecture:Transformer0.0K Cold

CogniDet by future7 is an 8 billion parameter Llama3-backbone model designed for detecting both factual and cognitive hallucinations in Large Language Model outputs. It specializes in identifying unsupported inferences and evaluations, going beyond simple factual contradictions. Trained on the large-scale CogniBench-L dataset, CogniDet offers efficient, single-pass detection, making it suitable for robust LLM evaluation, particularly in English knowledge-grounded dialogues.

Loading preview...

What the fuck is this model about?

CogniDet is an 8 billion parameter model built on the Llama3 architecture, developed by future7 as part of the CogniBench framework. Its primary purpose is to detect hallucinations in Large Language Model (LLM) outputs. Unlike many other models that only focus on factual inaccuracies, CogniDet is specifically designed to identify both factual hallucinations (claims contradicting provided context) and cognitive hallucinations (unsupported inferences or evaluations).

What makes THIS different from all the other models?

CogniDet stands out due to its dual detection capability and its foundation in a legal-inspired evaluation framework. It offers:

  • Comprehensive Hallucination Detection: It doesn't just check facts; it assesses the logical grounding of inferences, a critical aspect often missed by simpler detectors.
  • Efficient Inference: Leveraging an 8B Llama3 backbone, it performs single-pass detection, which is faster than traditional NLI-based methods.
  • Robust Training: Trained on the extensive CogniBench-L dataset (24k+ dialogues, 234k+ annotated sentences), ensuring broad coverage and accuracy.
  • Superior Performance: Achieves an F1 score of 73.80 for cognitive hallucination detection, outperforming baselines like SelfCheckGPT and RAGTruth.

Should I use this for my use case?

You should consider using CogniDet if your use case involves:

  • Evaluating LLM outputs for reliability: Especially when the LLM is expected to make inferences or evaluations based on provided context.
  • Ensuring cognitive faithfulness: If it's crucial that your LLM's reasoning is sound and supported, not just factually correct.
  • Working with English knowledge-grounded dialogues: The model is optimized for this domain.
  • Needing an efficient, single-pass hallucination detector: Its Llama3 backbone allows for faster detection compared to multi-pass methods.

You might need to consider alternatives or fine-tuning if:

  • Your primary language is not English.
  • You are working in highly specialized, domain-specific applications (e.g., clinical diagnosis) where further fine-tuning might be beneficial.
  • Your context window exceeds 8K tokens, as this is its current limitation.