KRLabsOrg/lettucedect-v2-qwen-2b

VISIONConcurrency Cost:1Model Size:2.3BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 22, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

KRLabsOrg/lettucedect-v2-qwen-2b is a 2.3 billion parameter generative hallucination detector based on Qwen3.5-2B, designed for Retrieval-Augmented Generation (RAG) and coding-agent settings. It uniquely localizes and categorizes hallucinated spans across code, tool output, and prose in a single pass, supporting 14 languages. This model excels at identifying unsupported content in agentic code answers, outperforming larger general-purpose judges and specialized detectors.

Loading preview...

Overview: Generative Hallucination Span Detection

lettucedect-v2-qwen-2b is a 2.3 billion parameter instruction-tuned generative model from KRLabsOrg, built on Qwen3.5-2B. It functions as a unified hallucination detector, identifying and categorizing unsupported spans within generated text relative to a given context. Unlike token classifiers, this model outputs structured JSON, providing exact hallucinated spans with their category and subcategory in a single pass.

Key Capabilities & Differentiators

  • Unified Span Detection: Localizes and types hallucinated spans across diverse content, including prose (RAGTruth, PsiloQA), code (SWE-bench-derived agent traces), and tool output.
  • Multilingual Support: Trained on a unified benchmark covering English and 13 other languages via the PsiloQA dataset.
  • Superior Code Hallucination Detection: Achieves a span-F1 of 0.602 and example-F1 of 0.835 on code-agent answers, significantly outperforming larger 120B and 550B judges and other off-the-shelf detectors that tend to over-flag generated code.
  • Competitive Prose Performance: Matches or exceeds specialized methods on established prose benchmarks like RAGTruth (example-F1 0.818) and PsiloQA (IoU 0.724 in English).
  • Structured Output: Returns a JSON object detailing each hallucinated span, its category (contradiction, fabricated_reference, unsupported_addition), and subcategory.

Ideal Use Cases

This model is particularly well-suited for developers and researchers needing a fast, small, and accurate solution for:

  • RAG System Evaluation: Identifying and mitigating hallucinations in RAG-generated responses.
  • Code Agent Verification: Detecting unsupported or fabricated elements in code generated by AI agents.
  • Multilingual Content Moderation: Ensuring factual consistency across various languages.
  • Automated Content Quality Assurance: Providing granular feedback on factual accuracy at the span level.