princeton-nlp/Llama-3-8B-ProLong-64k-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jul 21, 2024License:llama3Architecture:Transformer0.0K Warm

The princeton-nlp/Llama-3-8B-ProLong-64k-Instruct is an 8 billion parameter instruction-tuned language model developed by Princeton NLP. It is part of the ProLong family, which focuses on long-context capabilities, and is continued trained from Llama-3-8B. This specific model supports a 64K token context window, making it suitable for tasks requiring processing moderately long inputs.

Loading preview...

Model Overview

princeton-nlp/Llama-3-8B-ProLong-64k-Instruct is an 8 billion parameter instruction-tuned model from the ProLong (Princeton long-context language models) family, developed by Princeton NLP. It is built upon the Llama-3-8B architecture and has been specifically optimized for handling long contexts. This model is a variant within the ProLong collection, which includes models with context windows up to 512K tokens.

Key Capabilities & Training

  • Long-Context Processing: This model is designed to process inputs up to 64K tokens, making it suitable for tasks requiring understanding and generating content over moderately long documents or conversations.
  • Instruction Following: As an instruction-tuned model, it is optimized to follow user instructions effectively, making it versatile for various NLP applications.
  • Training Methodology: The ProLong models undergo continued training on specialized long-context datasets (e.g., princeton-nlp/prolong-data-64K) and are supervised fine-tuned using datasets like UltraChat. The development involved extensive ablations on pre-training data and SFT data to optimize long-context performance, detailed in their paper "How to Train Long-Context Language Models (Effectively)".

Use Cases

This model is particularly well-suited for applications that benefit from processing and generating content within a 64K token context window, such as:

  • Document Analysis: Summarizing, querying, or extracting information from lengthy texts.
  • Extended Conversations: Maintaining coherence and context over long dialogue turns.
  • Code Understanding: Analyzing and generating code snippets within a larger codebase context.