BRAG-Llama-3.1-8b-v0.1: RAG-Optimized SLM
BRAG-Llama-3.1-8b-v0.1 is an 8 billion parameter Small Language Model (SLM) from the maximalists' BRAG series, designed specifically for Retrieval-Augmented Generation (RAG) tasks. It supports an extended context length of up to 128k tokens, making it suitable for processing substantial amounts of information.
Key Capabilities
- RAG with Diverse Data: Proficient in RAG tasks involving both structured (tables) and unstructured (text) data.
- Conversational RAG: Optimized for integrating RAG into conversational chat applications.
- Extended Context: Features a 128k token context window, allowing for comprehensive information retrieval and generation.
- English-Centric: Primarily trained and evaluated for English, leveraging the base model's multilingual foundation.
Performance Highlights
The model demonstrates competitive performance on the ChatRAG-Bench, scoring 52.29. This places it favorably against other SLMs and even some larger LLMs in RAG-specific evaluations. For instance, it outperforms BRAG-Llama-3-8b-v0.1 (51.70) and is close to BRAG-Qwen2-7b-v0.1 (53.23) and GPT-4-Turbo (54.03) on this benchmark.
Limitations
Users should note that the model is specifically trained for short contexts and may not perform optimally with very long inputs. Adhering to the recommended system prompt is crucial to prevent underperformance and potential hallucinations.