Buddhi-128K-Chat: Extended Context Chat Model
Buddhi-128K-Chat is a 7 billion parameter chat model from AI Planet, built upon the Mistral 7B Instruct base. Its primary differentiator is its significantly extended context window of up to 128,000 tokens, achieved through the innovative YaRN (Yet another Rope Extension) technique. This enhancement allows the model to maintain a deep understanding of context across very long documents and conversations.
Key Capabilities & Features
- 128K Context Window: Handles extensive text inputs for tasks like summarizing large documents or engaging in prolonged dialogues.
- Mistral 7B Instruct Base: Leverages the strong reasoning capabilities of the Mistral 7B Instruct v0.2 model.
- YaRN Technique: Utilizes NTK-aware positional interpolation, including Dynamic-YARN, to scale context length effectively.
- Chat-based: Fine-tuned as a general-purpose chat model.
Performance & Benchmarks
Buddhi-128K-Chat demonstrates competitive performance in both long and short context benchmarks. In the LongICLBench Banking77, it shows strong results across various context lengths, often outperforming other 128K models like NousResearch/Yarn-Mistral-7b-128k in certain configurations. For short context tasks, it achieves an average score of 64.42 across ARC, HellaSwag, Winogrande, TruthfulQA, and MMLU benchmarks, positioning it favorably among similar 7B 128K models.
Hardware Requirements
- 128K Context: Requires approximately 80GB VRAM (A100 preferred).
- 32K Context: Requires approximately 40GB VRAM (A100 preferred).
Use Cases
- Comprehensive document summarization
- Detailed narrative generation
- Intricate question-answering over large texts
- Applications requiring extensive context retention in conversations.