DAMO-NLP-SG/Qwen2.5-7B-LongPO-128K

Loading
Public
7.6B
FP8
32768
Feb 17, 2025
License: apache-2.0
Hugging Face
Overview

LongPO: Long Context Self-Evolution for LLMs

DAMO-NLP-SG's Qwen2.5-7B-LongPO-128K is a 7 billion parameter model specifically trained using the novel LongPO (Long Context Self-Evolution through Short-to-Long Preference Optimization) method. This approach enables the model to extend its context length to 128K tokens while maintaining strong alignment and performance, crucially without relying on human or superior LLM annotations for long-context alignment data. The training process is designed to prevent degradation of short-context capabilities, ensuring versatile performance.

Key Capabilities

  • Self-Evolving Long-Context Alignment: Achieves extended context understanding without external annotations.
  • One-Stage Context Extension: Integrates context length extension and alignment into a single training phase.
  • Preserved Short-Context Performance: Maintains strong performance on tasks requiring shorter contexts.
  • Enhanced Long-Context Reasoning: Demonstrates improved performance on long-context benchmarks like InfiniteBench and RULER compared to its base model, Qwen2.5-7B-Instruct.

Good for

  • Applications requiring robust performance across both very long and short context windows.
  • Tasks such as summarizing extensive documents, complex question answering over large texts, and processing long codebases.
  • Scenarios where maintaining high accuracy on traditional short-context tasks is as important as handling extended inputs.