Qwen/Qwen3-Next-80B-A3B-Instruct

Warm
Public
80B
FP8
32768
4
Sep 9, 2025
License: apache-2.0
Hugging Face

Qwen/Qwen3-Next-80B-A3B-Instruct is an 80 billion parameter instruction-tuned causal language model developed by Qwen, featuring a hybrid attention mechanism and high-sparsity Mixture-of-Experts (MoE) architecture. It is designed for efficient context modeling and ultra-long context lengths up to 262,144 tokens natively, with extensibility to 1 million tokens via YaRN. This model excels in parameter efficiency and inference speed, particularly for long-context tasks, and demonstrates strong performance across knowledge, reasoning, coding, and alignment benchmarks.

Overview

Qwen3-Next-80B-A3B-Instruct: Next-Generation Efficiency and Long Context

Qwen3-Next-80B-A3B-Instruct is the inaugural model in the Qwen3-Next series by Qwen, focusing on enhanced scaling efficiency through innovative architectural designs. This 80 billion parameter instruction-tuned model is engineered to address the growing demands for powerful, agentic AI with extensive context capabilities.

Key Capabilities & Innovations

  • Hybrid Attention: Integrates Gated DeltaNet and Gated Attention for highly efficient context modeling, supporting ultra-long context lengths.
  • High-Sparsity Mixture-of-Experts (MoE): Features an extremely low activation ratio in MoE layers, significantly reducing FLOPs per token while maintaining model capacity.
  • Multi-Token Prediction (MTP): Boosts pretraining performance and accelerates inference, though not generally available in Hugging Face Transformers.
  • Ultra-Long Context: Natively supports 262,144 tokens and is extensible up to 1,010,000 tokens using YaRN scaling techniques, demonstrating strong performance on the 1M RULER benchmark.
  • Robust Performance: Achieves competitive results across various benchmarks, including MMLU-Pro, GPQA, LiveCodeBench, and Arena-Hard v2, often performing on par with or surpassing larger models like Qwen3-235B-A22B-Instruct-2507 in certain areas, especially for long-context tasks.

When to Use This Model

Qwen3-Next-80B-A3B-Instruct is particularly well-suited for applications requiring:

  • Extreme Long-Context Processing: Ideal for tasks involving extensive documents, codebases, or conversational histories where context length is critical.
  • High Inference Throughput: Offers significant inference speed advantages for contexts over 32K tokens, making it efficient for demanding workloads.
  • Agentic AI Development: Excels in tool-calling capabilities, with recommendations to use the Qwen-Agent framework for optimal agentic performance.
  • Resource-Efficient Deployment: Despite its large parameter count, its MoE architecture and stability optimizations contribute to parameter efficiency and robust training.