selfrag/selfrag_llama2_7b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Oct 18, 2023License:mitArchitecture:Transformer0.1K Open Weights Warm

selfrag/selfrag_llama2_7b is a 7 billion parameter Llama 2-based model developed by Akari Asai and collaborators, specifically designed for Self-RAG (Retrieval Augmented Generation). This model generates outputs while adaptively calling a retrieval system and criticizing its own generations and retrieved passages using reflection tokens. It excels at instruction-following tasks by leveraging fine-grained feedback for improved accuracy and relevance.

Loading preview...

Overview

This model is a 7 billion parameter variant of Llama 2, fine-tuned for the Self-RAG (Retrieval Augmented Generation) framework. Developed by Akari Asai and collaborators, it integrates a unique mechanism where the model not only generates responses but also adaptively interacts with a retrieval system and critically evaluates its own output and the retrieved information. This self-reflection capability is enabled by "reflection tokens" that guide the model to decide when to retrieve, what to retrieve, and how to integrate or critique the information.

Key Capabilities

  • Adaptive Retrieval: The model can dynamically decide when to engage a retrieval system based on the query's needs, rather than retrieving for every query.
  • Self-Critique: It generates reflection tokens to assess the utility and factual grounding of its own generations and retrieved passages, leading to more accurate and relevant outputs.
  • Instruction Following: Trained on instruction-following corpora, it is adept at understanding and responding to diverse user queries.
  • Efficient Learning: Utilizes a standard next-token prediction objective with interleaved passages and reflection tokens for stable and efficient training.

Good For

  • Fact-checking and Grounded Generation: Ideal for applications requiring high factual accuracy by leveraging external knowledge sources and self-correction.
  • Complex Question Answering: Suitable for queries that benefit from external information retrieval and critical evaluation of answers.
  • Research and Development: Provides a robust framework for exploring advanced RAG techniques and self-improving language models. The full inference pipeline with a retrieval system and fine-grained tree decoding is available via their code.