Overview
This model is a 7 billion parameter variant of Llama 2, fine-tuned for the Self-RAG (Retrieval Augmented Generation) framework. Developed by Akari Asai and collaborators, it integrates a unique mechanism where the model not only generates responses but also adaptively interacts with a retrieval system and critically evaluates its own output and the retrieved information. This self-reflection capability is enabled by "reflection tokens" that guide the model to decide when to retrieve, what to retrieve, and how to integrate or critique the information.
Key Capabilities
- Adaptive Retrieval: The model can dynamically decide when to engage a retrieval system based on the query's needs, rather than retrieving for every query.
- Self-Critique: It generates reflection tokens to assess the utility and factual grounding of its own generations and retrieved passages, leading to more accurate and relevant outputs.
- Instruction Following: Trained on instruction-following corpora, it is adept at understanding and responding to diverse user queries.
- Efficient Learning: Utilizes a standard next-token prediction objective with interleaved passages and reflection tokens for stable and efficient training.
Good For
- Fact-checking and Grounded Generation: Ideal for applications requiring high factual accuracy by leveraging external knowledge sources and self-correction.
- Complex Question Answering: Suitable for queries that benefit from external information retrieval and critical evaluation of answers.
- Research and Development: Provides a robust framework for exploring advanced RAG techniques and self-improving language models. The full inference pipeline with a retrieval system and fine-grained tree decoding is available via their code.