princeton-nlp/Llama-3-8B-ProLong-512k-Instruct

Warm
Public
8B
FP8
8192
Aug 22, 2024
License: llama3
Hugging Face
Overview

Overview

Princeton NLP's Llama-3-8B-ProLong-512k-Instruct is a long-context language model, part of the ProLong family, built upon the Llama-3-8B architecture. It has been extensively continued trained and supervised fine-tuned to achieve a remarkable maximum context window of 512,000 tokens. This model is recognized as one of the top-performing long-context models at the 10B scale, as evaluated by HELMET.

Key Capabilities

  • Exceptional Long-Context Handling: Processes and understands information across extremely large context windows, up to 512K tokens.
  • Instruction Following: Fine-tuned with UltraChat for robust instruction-following capabilities.
  • Optimized Training Methodology: Developed through thorough ablations on long-context pre-training data and SFT data, detailed in their paper, "How to Train Long-Context Language Models (Effectively)".

Good For

  • Document Analysis: Tasks requiring comprehension and summarization of very long documents, reports, or codebases.
  • Extended Conversations: Maintaining coherence and context over prolonged dialogues or chat histories.
  • Research and Development: As a strong base for further fine-tuning on domain-specific long-context applications.