corag/CoRAG-Llama3.1-8B-MultihopQA
The corag/CoRAG-Llama3.1-8B-MultihopQA model, developed by corag, is an 8 billion parameter Llama 3.1-based language model fine-tuned for Multihop Question Answering (MultihopQA). With a 32768 token context length, it excels at complex, multi-step information retrieval and synthesis tasks. This model demonstrates strong performance on benchmarks like 2WikiQA, HotpotQA, Bamboogle, and MuSiQue, specifically optimized for accurate answers requiring multiple hops of reasoning.
Loading preview...
CoRAG-Llama3.1-8B-MultihopQA Overview
This model is an 8 billion parameter Llama 3.1-based language model, specifically fine-tuned by corag for Multihop Question Answering (MultihopQA). It leverages the Chain-of-Retrieval Augmented Generation (CoRAG) approach, as detailed in the paper "Chain-of-Retrieval Augmented Generation" (arXiv:2501.14342).
Key Capabilities & Performance
The CoRAG-Llama3.1-8B-MultihopQA model is designed to tackle complex questions that require synthesizing information from multiple sources or steps. It has been evaluated on several challenging MultihopQA datasets, demonstrating superior performance compared to base Llama-3.1-8B-Instruct and even GPT-4o in certain configurations. For instance, with L=10 and best-of-8 decoding, it achieves:
- 72.5 EM / 77.3 F1 on 2WikiQA
- 56.3 EM / 69.8 F1 on HotpotQA
- 54.4 EM / 68.3 F1 on Bamboogle
- 30.9 EM / 42.4 F1 on MuSiQue
These results highlight its effectiveness in accurately extracting and combining information for multi-step reasoning. The model's training data is available on the MultihopQA dataset.
When to Use This Model
- Primary Use Case: This model is explicitly optimized for MultihopQA tasks, where questions require retrieving and synthesizing information across multiple documents or reasoning steps.
- Not Recommended For: General-purpose conversational AI, creative writing, or tasks outside of its specialized MultihopQA domain, as its performance may not be optimal for such applications.