unsloth/Qwen3-235B-A22B-Instruct-2507

TEXT GENERATIONConcurrency Cost:4Model Size:235BQuant:FP8Ctx Length:32kPublished:Jul 21, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The Qwen3-235B-A22B-Instruct-2507 model by Qwen is a 235 billion parameter causal language model with 22 billion activated parameters and a native context length of 262,144 tokens. This updated instruction-tuned version significantly enhances general capabilities including instruction following, logical reasoning, mathematics, coding, and tool usage. It also shows substantial gains in long-tail knowledge coverage across multiple languages and improved alignment for subjective and open-ended tasks. Optimized for non-thinking mode, it excels in complex reasoning and agentic applications.

Loading preview...

Qwen3-235B-A22B-Instruct-2507: Enhanced Instruction-Following MoE Model

This model is an updated version of the Qwen3-235B-A22B series, specifically designed for instruction-following without a 'thinking mode'. It features a total of 235 billion parameters with 22 billion activated parameters and a native context length of 262,144 tokens, making it suitable for extensive long-context understanding.

Key Capabilities and Enhancements

  • General Capabilities: Significant improvements across instruction following, logical reasoning, text comprehension, mathematics, science, coding, and tool usage.
  • Multilingual Knowledge: Substantial gains in long-tail knowledge coverage across various languages.
  • User Alignment: Markedly better alignment with user preferences for subjective and open-ended tasks, leading to more helpful and higher-quality text generation.
  • Agentic Use: Excels in tool-calling capabilities, with recommendations to use Qwen-Agent for optimal performance.
  • Performance Benchmarks: Demonstrates strong performance across various benchmarks, often outperforming its predecessor and competing models in categories like GPQA, AIME25, HMMT25, ARC-AGI, ZebraLogic, LiveCodeBench, and MultiIF.

When to Use This Model

This model is ideal for applications requiring:

  • Advanced Instruction Following: For tasks where precise adherence to instructions is critical.
  • Complex Reasoning: Excels in mathematical, scientific, and logical reasoning challenges.
  • Long-Context Processing: Its 262K context window is beneficial for tasks requiring deep understanding of extensive documents or conversations.
  • Agent-Based Systems: Strong tool-calling abilities make it suitable for integrating with external tools and building sophisticated agents.
  • Multilingual Applications: Improved knowledge coverage across multiple languages supports diverse global use cases.