What the fuck is this model about?
CogniDet is an 8 billion parameter model built on the Llama3 architecture, developed by future7 as part of the CogniBench framework. Its primary purpose is to detect hallucinations in Large Language Model (LLM) outputs. Unlike many other models that only focus on factual inaccuracies, CogniDet is specifically designed to identify both factual hallucinations (claims contradicting provided context) and cognitive hallucinations (unsupported inferences or evaluations).
What makes THIS different from all the other models?
CogniDet stands out due to its dual detection capability and its foundation in a legal-inspired evaluation framework. It offers:
- Comprehensive Hallucination Detection: It doesn't just check facts; it assesses the logical grounding of inferences, a critical aspect often missed by simpler detectors.
- Efficient Inference: Leveraging an 8B Llama3 backbone, it performs single-pass detection, which is faster than traditional NLI-based methods.
- Robust Training: Trained on the extensive CogniBench-L dataset (24k+ dialogues, 234k+ annotated sentences), ensuring broad coverage and accuracy.
- Superior Performance: Achieves an F1 score of 73.80 for cognitive hallucination detection, outperforming baselines like SelfCheckGPT and RAGTruth.
Should I use this for my use case?
You should consider using CogniDet if your use case involves:
- Evaluating LLM outputs for reliability: Especially when the LLM is expected to make inferences or evaluations based on provided context.
- Ensuring cognitive faithfulness: If it's crucial that your LLM's reasoning is sound and supported, not just factually correct.
- Working with English knowledge-grounded dialogues: The model is optimized for this domain.
- Needing an efficient, single-pass hallucination detector: Its Llama3 backbone allows for faster detection compared to multi-pass methods.
You might need to consider alternatives or fine-tuning if:
- Your primary language is not English.
- You are working in highly specialized, domain-specific applications (e.g., clinical diagnosis) where further fine-tuning might be beneficial.
- Your context window exceeds 8K tokens, as this is its current limitation.