WisdomShell/GRIP-Llama-3-8B
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 9, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

WisdomShell/GRIP-Llama-3-8B is a Llama-3-8B based model developed by Bo Li et al. that re-conceptualizes retrieval as an intrinsic generative capability within LLMs. It integrates retrieval decisions directly into token-level decoding using control tokens, enabling self-triggered information planning for complex reasoning tasks. This model excels at dynamic, multi-turn question answering by autonomously evaluating knowledge and formulating contextual follow-up queries.

Loading preview...

GRIP-Llama-3-8B: Retrieval as Generation

WisdomShell/GRIP-Llama-3-8B is a Llama-3-8B based model that introduces a novel paradigm called GRIP (Generation-guided Retrieval with Information Planning). Unlike traditional RAG systems that treat retrieval as an external, one-shot process, GRIP internalizes retrieval decisions directly into the model's generative policy. This allows for end-to-end, self-triggered information planning within a single autoregressive trajectory, making retrieval an intrinsic part of the generation process.

Key Capabilities

  • Token-Driven Control: Embeds retrieval behaviors directly into the model's generative policy using explicit control tokens (e.g., [RETRIEVE], [ANSWER], [INTERMEDIARY]).
  • Self-Triggered Planning: Autonomously decides when to use internal knowledge, how to reformulate targeted queries based on partial reasoning, and when to terminate search.
  • Adaptive Retrieval Depth: Dynamically adjusts the number of retrieval rounds based on question complexity, avoiding redundant searches.
  • Unified Decoding Trajectory: Tightly couples multi-step reasoning and on-the-fly evidence integration into a continuous generation flow.
  • Optimized Training: Utilizes structured supervised fine-tuning (SFT) over four distinct behavioral patterns, further refined by rule-based Reinforcement Learning (DAPO) for accurate and balanced retrieval control.

Performance & Use Cases

GRIP-Llama-3-8B demonstrates state-of-the-art performance, surpassing strong open-source RAG baselines like GainRAG and R1-Searcher. It achieves performance competitive with GPT-4o across five QA benchmarks, despite using a significantly smaller Llama-3-8B backbone. This makes it particularly suitable for complex question-answering scenarios requiring dynamic information retrieval and multi-step reasoning, where traditional RAG systems might fall short due to their rigid, external retrieval mechanisms.