Overview

This model is a 7 billion parameter variant of Llama 2, fine-tuned for the Self-RAG (Retrieval Augmented Generation) framework. Developed by Akari Asai and collaborators, it integrates a unique mechanism where the model not only generates responses but also adaptively interacts with a retrieval system and critically evaluates its own output and the retrieved information. This self-reflection capability is enabled by "reflection tokens" that guide the model to decide when to retrieve, what to retrieve, and how to integrate or critique the information.

Key Capabilities

Adaptive Retrieval: The model can dynamically decide when to engage a retrieval system based on the query's needs, rather than retrieving for every query.
Self-Critique: It generates reflection tokens to assess the utility and factual grounding of its own generations and retrieved passages, leading to more accurate and relevant outputs.
Instruction Following: Trained on instruction-following corpora, it is adept at understanding and responding to diverse user queries.
Efficient Learning: Utilizes a standard next-token prediction objective with interleaved passages and reflection tokens for stable and efficient training.

Good For

Fact-checking and Grounded Generation: Ideal for applications requiring high factual accuracy by leveraging external knowledge sources and self-correction.
Complex Question Answering: Suitable for queries that benefit from external information retrieval and critical evaluation of answers.
Research and Development: Provides a robust framework for exploring advanced RAG techniques and self-improving language models. The full inference pipeline with a retrieval system and fine-grained tree decoding is available via their code.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)