NousResearch/Yarn-Mistral-7b-128k

Loading
Public
7B
FP8
8192
License: apache-2.0
Hugging Face
Overview

Nous-Yarn-Mistral-7b-128k: Extended Context Mistral Model

Nous-Yarn-Mistral-7b-128k is a 7 billion parameter language model built upon the Mistral-7B-v0.1 architecture. Developed by NousResearch, this model has been further pretrained for 1500 steps using the YaRN (Yet another RoPE-scaling method) extension, significantly expanding its context window to an impressive 128,000 tokens.

Key Capabilities & Features

  • Extended Context Window: Supports a 128k token context, making it suitable for applications requiring processing of very long documents or conversations.
  • Strong Long-Context Performance: Demonstrates competitive perplexity scores across various long context lengths (8k to 128k tokens), with a perplexity of 2.19 at 128k tokens.
  • Minimal Short-Context Degradation: Benchmarks indicate that extending the context window with YaRN results in only minimal degradation of performance on standard short-context tasks like ARC-c, Hellaswag, MMLU, and Truthful QA, preserving much of the original Mistral-7B's capabilities.
  • Based on Mistral-7B-v0.1: Inherits the robust base performance and efficiency of the Mistral-7B architecture.

Ideal Use Cases

  • Document Analysis: Processing and summarizing extensive reports, legal documents, or research papers.
  • Long-form Content Generation: Creating or understanding lengthy articles, books, or complex narratives.
  • Extended Chatbots/Conversational AI: Maintaining context over very long dialogues without losing coherence.
  • Code Analysis: Handling large codebases or complex programming projects where extensive context is beneficial.