Qwen3-4B-MLX-bf16 Overview
Qwen3-4B-MLX-bf16 is a 4 billion parameter causal language model from the Qwen3 series, designed for both pretraining and post-training stages. A key differentiator is its dynamic 'thinking' and 'non-thinking' modes, allowing the model to adapt its processing for different task complexities. The 'thinking mode' is optimized for intricate logical reasoning, mathematics, and code generation, while the 'non-thinking mode' handles general-purpose dialogue efficiently.
Key Capabilities
- Adaptive Reasoning: Seamlessly switches between a reasoning-intensive 'thinking mode' and an efficient 'non-thinking mode' based on task requirements, enhancing performance in areas like math, coding, and commonsense reasoning.
- Human Preference Alignment: Excels in creative writing, role-playing, and multi-turn dialogues, delivering engaging conversational experiences.
- Agentic Functionality: Demonstrates strong capabilities in integrating with external tools, achieving leading performance in complex agent-based tasks among open-source models.
- Multilingual Support: Supports over 100 languages and dialects, offering robust multilingual instruction following and translation abilities.
- Extended Context: Natively supports a context length of 32,768 tokens, extendable up to 131,072 tokens using the YaRN method for processing long texts.
Best Practices
Optimal performance is achieved by using specific sampling parameters for each mode: Temperature=0.6, TopP=0.95, TopK=20 for thinking mode, and Temperature=0.7, TopP=0.8, TopK=20 for non-thinking mode. It is recommended to use an output length of 32,768 tokens for most queries, extending to 38,912 for highly complex problems.