princeton-nlp/Llama-3-8B-ProLong-512k-Base

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Aug 22, 2024License:llama3Architecture:Transformer0.0K Warm

princeton-nlp/Llama-3-8B-ProLong-512k-Base is an 8 billion parameter base model developed by Princeton NLP, part of the ProLong family of long-context language models. It is continued trained from Llama-3-8B and supports an extended context window of up to 512K tokens. This model is specifically designed for long-context understanding and processing, making it suitable for tasks requiring extensive textual analysis.

Loading preview...

ProLong-512k-Base Overview

princeton-nlp/Llama-3-8B-ProLong-512k-Base is an 8 billion parameter base model from the ProLong family, developed by Princeton NLP. It is built upon the Llama-3-8B architecture and has been specifically continued trained to handle exceptionally long contexts, supporting a maximum context window of 512K tokens. This model is a foundational component for the ProLong series, which includes instruct-tuned variants that have demonstrated strong performance among long-context models at the 10B scale, as evaluated by HELMET.

Key Capabilities

  • Extended Context Window: Processes up to 512K tokens, significantly surpassing standard context lengths.
  • Llama-3-8B Foundation: Benefits from the robust architecture and pre-training of the Llama-3-8B model.
  • Research-Backed Training: Developed based on thorough ablations and findings detailed in the paper "How to Train Long-Context Language Models (Effectively)" (arXiv:2410.02660).
  • Base Model: Provides a strong foundation for further fine-tuning or specific application development.

Good For

  • Long Document Analysis: Ideal for tasks involving very long texts, such as legal documents, research papers, or extensive codebases.
  • Custom Fine-tuning: Serves as an excellent base for developers looking to create specialized long-context models for particular domains.
  • Research and Development: Useful for exploring and advancing techniques in long-context language understanding and generation.