Buddhi-128K-Chat: Extended Context Chat Model

Buddhi-128K-Chat is a 7 billion parameter chat model from AI Planet, built upon the Mistral 7B Instruct base. Its primary differentiator is its significantly extended context window of up to 128,000 tokens, achieved through the innovative YaRN (Yet another Rope Extension) technique. This enhancement allows the model to maintain a deep understanding of context across very long documents and conversations.

Key Capabilities & Features

128K Context Window: Handles extensive text inputs for tasks like summarizing large documents or engaging in prolonged dialogues.
Mistral 7B Instruct Base: Leverages the strong reasoning capabilities of the Mistral 7B Instruct v0.2 model.
YaRN Technique: Utilizes NTK-aware positional interpolation, including Dynamic-YARN, to scale context length effectively.
Chat-based: Fine-tuned as a general-purpose chat model.

Performance & Benchmarks

Buddhi-128K-Chat demonstrates competitive performance in both long and short context benchmarks. In the LongICLBench Banking77, it shows strong results across various context lengths, often outperforming other 128K models like NousResearch/Yarn-Mistral-7b-128k in certain configurations. For short context tasks, it achieves an average score of 64.42 across ARC, HellaSwag, Winogrande, TruthfulQA, and MMLU benchmarks, positioning it favorably among similar 7B 128K models.

Hardware Requirements

128K Context: Requires approximately 80GB VRAM (A100 preferred).
32K Context: Requires approximately 40GB VRAM (A100 preferred).

Use Cases

Comprehensive document summarization
Detailed narrative generation
Intricate question-answering over large texts
Applications requiring extensive context retention in conversations.

Overview

Buddhi-128K-Chat: Extended Context Chat Model

Key Capabilities & Features

Performance & Benchmarks

Hardware Requirements

Use Cases

Full Model Card (README)