hamishivi/Qwen3-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 14, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Qwen3-8B is an 8.2 billion parameter causal language model from the Qwen series, developed by Qwen. This model uniquely supports seamless switching between a "thinking mode" for complex reasoning tasks like math, coding, and logical problems, and a "non-thinking mode" for general-purpose dialogue, ensuring optimal performance across diverse scenarios. It features enhanced reasoning capabilities, superior human preference alignment for creative writing and multi-turn dialogues, and strong agentic abilities with support for over 100 languages. The model natively supports a 32,768 token context length, extendable to 131,072 tokens using YaRN scaling.

Loading preview...

Qwen3-8B: A Versatile Language Model with Adaptive Reasoning

Qwen3-8B is an 8.2 billion parameter causal language model from the Qwen series, designed for advanced reasoning and flexible conversational capabilities. It introduces a unique feature allowing seamless switching between a "thinking mode" for complex logical reasoning, mathematics, and code generation, and a "non-thinking mode" for efficient, general-purpose dialogue. This adaptive approach optimizes performance across various tasks.

Key Capabilities

  • Adaptive Reasoning: Dynamically switches between a detailed reasoning process and direct response generation, enhancing performance in both complex problem-solving and general conversation.
  • Enhanced Performance: Demonstrates significant improvements in reasoning benchmarks, surpassing previous Qwen models in mathematics, code generation, and commonsense logic.
  • Human Preference Alignment: Excels in creative writing, role-playing, and multi-turn dialogues, providing a more natural and engaging user experience.
  • Agentic Abilities: Offers strong tool-calling capabilities, integrating precisely with external tools in both thinking and non-thinking modes, achieving leading performance in agent-based tasks among open-source models.
  • Multilingual Support: Supports over 100 languages and dialects with robust multilingual instruction following and translation capabilities.
  • Extended Context Window: Natively handles up to 32,768 tokens, with validated support for up to 131,072 tokens using YaRN scaling techniques.

Best Practices for Optimal Use

To achieve the best results, specific sampling parameters are recommended for each mode:

  • Thinking Mode (enable_thinking=True): Use Temperature=0.6, TopP=0.95, TopK=20, and MinP=0. Avoid greedy decoding.
  • Non-Thinking Mode (enable_thinking=False): Use Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.

For long text processing, YaRN scaling can be enabled, though it's advised only when necessary to avoid potential performance impacts on shorter texts. The model also provides soft switches (/think and /no_think) within user prompts for dynamic mode control in multi-turn conversations.