occ-ai/OCC-RAG-0.6B

TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 29, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

OCC-RAG-0.6B is a 0.6 billion parameter small language model developed by occ-ai, specialized for faithful, context-grounded question answering. It produces structured reasoning traces with explicit source citations and can abstain if context does not support an answer. Despite its compact size, it matches or exceeds general-purpose models 2-6x larger on multi-hop reasoning, faithfulness, and refusal benchmarks, making it ideal for efficient and reliable RAG applications.

Loading preview...

OCC-RAG-0.6B: Optimal Cognitive Core for Faithful Question Answering

OCC-RAG-0.6B is a 0.6 billion parameter small language model from occ-ai, specifically designed for faithful, context-grounded question answering. It is part of the first generation of Optimal Cognitive Core (OCC) specialized reasoning models. This model excels at generating structured reasoning traces with explicit source citations, determining if context supports an answer, and abstaining when information is insufficient. It is mid-trained from Qwen/Qwen3-0.6B-Base on a large synthetic corpus of multi-context, multi-hop QA pairs with citation-anchored reasoning traces.

Key Capabilities

  • Faithful by design: Answers exclusively from provided context, achieving the lowest memorization ratio across all evaluated scales, including models up to 32B parameters.
  • Calibrated abstention: Reliably outputs "Not enough information" when context does not support a query.
  • Structured, citable reasoning: Provides a transparent reasoning trace (query analysis \u2192 source analysis \u2192 reasoning \u2192 status \u2192 answer) with source IDs.
  • Compact efficiency: Delivers chain-of-thought-level transparency and performance at a fraction of the inference cost of larger models.

Good for

  • Retrieval-Augmented Generation (RAG) systems: Ensures answers are strictly grounded in provided documents.
  • Applications requiring high faithfulness: Minimizes hallucination by refusing to answer outside of given context.
  • Resource-constrained environments: Its small size (0.6B parameters) allows for deployment on desktop systems or other limited infrastructure.
  • Multi-hop and multi-context question answering: Demonstrates strong performance on complex reasoning tasks, matching or exceeding larger general-purpose models.