Name: princeton-nlp/Llama-3-8B-ProLong-64k-Base API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: princeton-nlp

ProLong-64k-Base: Long-Context Llama-3-8B

This model, princeton-nlp/Llama-3-8B-ProLong-64k-Base, is part of the ProLong (Princeton long-context language models) family, developed by Princeton NLP. It is an 8 billion parameter base model, continued trained from Llama-3-8B, specifically engineered for enhanced long-context capabilities.

Key Capabilities

Extended Context Window: Supports a context window of up to 64K tokens, making it suitable for processing and generating content from lengthy documents.
Base Model: Serves as a foundational model within the ProLong series, which also includes instruction-tuned variants and models with even larger context windows (up to 512K tokens).
Research-Backed Training: Developed based on extensive ablations and findings detailed in the paper "How to Train Long-Context Language Models (Effectively)", focusing on optimal long-context pre-training data and SFT data.
Llama-3-8B Foundation: Benefits from the strong base capabilities of the Llama-3-8B architecture.

Good For

Long Document Analysis: Ideal for tasks such as summarizing, querying, or generating content based on very long texts, articles, or codebases.
Further Fine-tuning: As a base model, it provides a strong starting point for custom fine-tuning on specific long-context tasks or domains.
Research and Development: Useful for researchers exploring long-context language model behavior and performance, particularly given its detailed training methodology.

Overview

ProLong-64k-Base: Long-Context Llama-3-8B

Key Capabilities

Good For

Full Model Card (README)