Overview
Nous-Yarn-Mistral-7b-128k: Extended Context Mistral Model
Nous-Yarn-Mistral-7b-128k is a 7 billion parameter language model built upon the Mistral-7B-v0.1 architecture. Developed by NousResearch, this model has been further pretrained for 1500 steps using the YaRN (Yet another RoPE-scaling method) extension, significantly expanding its context window to an impressive 128,000 tokens.
Key Capabilities & Features
- Extended Context Window: Supports a 128k token context, making it suitable for applications requiring processing of very long documents or conversations.
- Strong Long-Context Performance: Demonstrates competitive perplexity scores across various long context lengths (8k to 128k tokens), with a perplexity of 2.19 at 128k tokens.
- Minimal Short-Context Degradation: Benchmarks indicate that extending the context window with YaRN results in only minimal degradation of performance on standard short-context tasks like ARC-c, Hellaswag, MMLU, and Truthful QA, preserving much of the original Mistral-7B's capabilities.
- Based on Mistral-7B-v0.1: Inherits the robust base performance and efficiency of the Mistral-7B architecture.
Ideal Use Cases
- Document Analysis: Processing and summarizing extensive reports, legal documents, or research papers.
- Long-form Content Generation: Creating or understanding lengthy articles, books, or complex narratives.
- Extended Chatbots/Conversational AI: Maintaining context over very long dialogues without losing coherence.
- Code Analysis: Handling large codebases or complex programming projects where extensive context is beneficial.