princeton-nlp/Llama-3-8B-ProLong-64k-Base

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Jul 22, 2024License:llama3Architecture:Transformer0.0K Warm

The princeton-nlp/Llama-3-8B-ProLong-64k-Base is an 8 billion parameter base model from the ProLong family, developed by Princeton NLP. It is continued trained from Llama-3-8B with a focus on long-context understanding, supporting a context window of up to 64K tokens. This model is designed for applications requiring processing and generating text over extended document lengths.

Loading preview...

ProLong-64k-Base: Long-Context Llama-3-8B

This model, princeton-nlp/Llama-3-8B-ProLong-64k-Base, is part of the ProLong (Princeton long-context language models) family, developed by Princeton NLP. It is an 8 billion parameter base model, continued trained from Llama-3-8B, specifically engineered for enhanced long-context capabilities.

Key Capabilities

  • Extended Context Window: Supports a context window of up to 64K tokens, making it suitable for processing and generating content from lengthy documents.
  • Base Model: Serves as a foundational model within the ProLong series, which also includes instruction-tuned variants and models with even larger context windows (up to 512K tokens).
  • Research-Backed Training: Developed based on extensive ablations and findings detailed in the paper "How to Train Long-Context Language Models (Effectively)", focusing on optimal long-context pre-training data and SFT data.
  • Llama-3-8B Foundation: Benefits from the strong base capabilities of the Llama-3-8B architecture.

Good For

  • Long Document Analysis: Ideal for tasks such as summarizing, querying, or generating content based on very long texts, articles, or codebases.
  • Further Fine-tuning: As a base model, it provides a strong starting point for custom fine-tuning on specific long-context tasks or domains.
  • Research and Development: Useful for researchers exploring long-context language model behavior and performance, particularly given its detailed training methodology.