LongAlign-13B-64k-base Overview

LongAlign-13B-64k-base is a 13 billion parameter language model developed by THUDM, built upon the Llama-2-13B architecture. Its primary distinguishing feature is an significantly extended 64k token context window, a substantial increase over the base Llama-2 model's typical context length. This model is part of the broader LongAlign initiative, which focuses on advancing LLM alignment for long-context understanding.

Key Capabilities and Training

Extended Context Window: Supports a 64k token context, enabling processing of very long documents and conversations.
Long-Context Alignment: Trained using the novel LongAlign-10k dataset, which comprises 10,000 long instruction data samples ranging from 8k to 64k tokens in length.
Optimized Training Strategies: Incorporates specialized training techniques such as packing with loss weighting and sorted batching to efficiently handle and learn from long sequences.
Base Model for Chat: This -base variant serves as the foundation for the instruction-tuned LongAlign-13B-64k chat model, indicating its suitability for further fine-tuning on conversational tasks requiring extensive context.

Use Cases

This model is particularly well-suited for applications that require processing and generating responses based on very long inputs, such as:

Summarizing lengthy documents or articles.
Answering complex questions from extensive knowledge bases.
Engaging in prolonged, context-aware conversations.
Tasks demanding deep understanding across large spans of text.

Overview

LongAlign-13B-64k-base Overview

Key Capabilities and Training

Use Cases

Full Model Card (README)