Tongyi-Zhiwen/QwenLong-L1-32B
TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:May 23, 2025License:apache-2.0Architecture:Transformer0.2K Open Weights Cold

QwenLong-L1-32B is a 32 billion parameter long-context large reasoning model developed by Tongyi Lab, Alibaba Group. It is the first long-context LRM trained with reinforcement learning (RL) for enhanced long-context reasoning capabilities. The model excels in document question answering (DocQA) benchmarks, outperforming other flagship LRMs and achieving performance comparable to Claude-3.7-Sonnet-Thinking. It is optimized for robust long-context generalization across mathematical, logical, and multi-hop reasoning tasks.

Loading preview...

QwenLong-L1-32B: Long-Context Reasoning with Reinforcement Learning

QwenLong-L1-32B, developed by Tongyi Lab, Alibaba Group, is a 32 billion parameter model specifically designed for robust long-context reasoning. It stands out as the first long-context Large Reasoning Model (LRM) trained using a novel reinforcement learning (RL) framework. This framework enhances short-context LRMs through progressive context scaling during RL training, incorporating a warm-up supervised fine-tuning phase, a curriculum-guided RL phase, and a difficulty-aware retrospective sampling mechanism.

Key Capabilities and Features

  • Reinforcement Learning for Long Contexts: Utilizes a unique RL framework to transition from short-context proficiency to strong long-context generalization.
  • Superior DocQA Performance: Achieves leading performance on seven long-context DocQA benchmarks, including mathematical, logical, and multi-hop reasoning tasks.
  • Competitive Benchmarking: Outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B, with performance on par with Claude-3.7-Sonnet-Thinking.
  • Extended Context Handling: Validated for context lengths up to 131,072 tokens using the YaRN scaling method, with an original max position embedding of 32,768 tokens.
  • Specialized Training Dataset: Trained with DocQA-RL-1.6K, a dataset comprising 1.6K document question answering problems across diverse reasoning domains.

Good for

  • Complex Document Analysis: Ideal for applications requiring deep understanding and reasoning over very long documents, such as financial reports, legal texts, or research papers.
  • Advanced Question Answering: Excels in document question answering tasks that demand mathematical, logical, or multi-hop reasoning.
  • Benchmarking and Research: Provides a strong baseline and research platform for exploring reinforcement learning in long-context LLMs.
Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p