princeton-nlp/Llama-3-8B-ProLong-512k-Instruct
Llama-3-8B-ProLong-512k-Instruct is an 8 billion parameter instruction-tuned language model developed by Princeton NLP, based on Llama-3-8B. It is specifically optimized for long-context understanding, featuring an extended context window of 512,000 tokens. This model excels at processing and generating content over extremely long inputs, making it suitable for tasks requiring extensive document analysis or conversation history.
Loading preview...
Overview
Princeton NLP's Llama-3-8B-ProLong-512k-Instruct is a long-context language model, part of the ProLong family, built upon the Llama-3-8B architecture. It has been extensively continued trained and supervised fine-tuned to achieve a remarkable maximum context window of 512,000 tokens. This model is recognized as one of the top-performing long-context models at the 10B scale, as evaluated by HELMET.
Key Capabilities
- Exceptional Long-Context Handling: Processes and understands information across extremely large context windows, up to 512K tokens.
- Instruction Following: Fine-tuned with UltraChat for robust instruction-following capabilities.
- Optimized Training Methodology: Developed through thorough ablations on long-context pre-training data and SFT data, detailed in their paper, "How to Train Long-Context Language Models (Effectively)".
Good For
- Document Analysis: Tasks requiring comprehension and summarization of very long documents, reports, or codebases.
- Extended Conversations: Maintaining coherence and context over prolonged dialogues or chat histories.
- Research and Development: As a strong base for further fine-tuning on domain-specific long-context applications.