ICTNLP/Auto-RAG-Llama-3-8B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Nov 29, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

ICTNLP/Auto-RAG-Llama-3-8B-Instruct is an 8 billion parameter language model developed by the ICTNLP Group, fine-tuned from Meta-Llama3-8B-Instruct. This model is specifically designed for autonomous retrieval-augmented generation (RAG), leveraging synthesized iterative retrieval instruction data. It excels in tasks requiring advanced information retrieval and integration, offering a context length of 8192 tokens.

Loading preview...

Auto-RAG: Autonomous Retrieval-Augmented Generation

ICTNLP/Auto-RAG-Llama-3-8B-Instruct is an 8 billion parameter language model developed by the ICTNLP Group, specifically designed for autonomous retrieval-augmented generation (RAG). It is fine-tuned from Meta-Llama3-8B-Instruct and utilizes a unique training approach involving synthesized iterative retrieval instruction data. This model aims to enhance the capabilities of large language models in tasks that require dynamic and intelligent information retrieval.

Key Capabilities

  • Autonomous Retrieval-Augmented Generation: Optimized for tasks where the model needs to autonomously retrieve relevant information and integrate it into its responses.
  • Iterative Retrieval: Trained with data that simulates iterative retrieval processes, allowing for more sophisticated information gathering.
  • Extended Context Window: Supports an 8192-token context length, enabling the processing of longer documents and more complex queries.
  • Research-Backed: Developed by the ICTNLP Group, with details available in their paper "Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models" (arXiv:2411.19443).

Use Cases

This model is particularly well-suited for applications requiring advanced RAG functionalities, such as:

  • Complex question answering systems that need to consult external knowledge bases.
  • Information synthesis from multiple sources.
  • Building intelligent agents that can autonomously gather and process information.

Developers can deploy this model using vllm for efficient inference.