Tongyi-Zhiwen/QwenLong-L1-32B

Warm
Public
32B
FP8
32768
License: apache-2.0
Hugging Face
Overview

QwenLong-L1-32B: Long-Context Reasoning with Reinforcement Learning

QwenLong-L1-32B, developed by Tongyi Lab, Alibaba Group, is a 32 billion parameter model specifically designed for robust long-context reasoning. It stands out as the first long-context Large Reasoning Model (LRM) trained using a novel reinforcement learning (RL) framework. This framework enhances short-context LRMs through progressive context scaling during RL training, incorporating a warm-up supervised fine-tuning phase, a curriculum-guided RL phase, and a difficulty-aware retrospective sampling mechanism.

Key Capabilities and Features

  • Reinforcement Learning for Long Contexts: Utilizes a unique RL framework to transition from short-context proficiency to strong long-context generalization.
  • Superior DocQA Performance: Achieves leading performance on seven long-context DocQA benchmarks, including mathematical, logical, and multi-hop reasoning tasks.
  • Competitive Benchmarking: Outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B, with performance on par with Claude-3.7-Sonnet-Thinking.
  • Extended Context Handling: Validated for context lengths up to 131,072 tokens using the YaRN scaling method, with an original max position embedding of 32,768 tokens.
  • Specialized Training Dataset: Trained with DocQA-RL-1.6K, a dataset comprising 1.6K document question answering problems across diverse reasoning domains.

Good for

  • Complex Document Analysis: Ideal for applications requiring deep understanding and reasoning over very long documents, such as financial reports, legal texts, or research papers.
  • Advanced Question Answering: Excels in document question answering tasks that demand mathematical, logical, or multi-hop reasoning.
  • Benchmarking and Research: Provides a strong baseline and research platform for exploring reinforcement learning in long-context LLMs.