AvraamBunder/merged

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:May 9, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

AvraamBunder/merged is a 0.8 billion parameter causal language model from the Qwen3 series, developed by Qwen. It uniquely supports seamless switching between a "thinking mode" for complex logical reasoning, math, and coding, and a "non-thinking mode" for efficient general-purpose dialogue. This model is optimized for enhanced reasoning capabilities, human preference alignment, and advanced agentic use with tool-calling support, making it suitable for diverse conversational and task-oriented AI applications.

Loading preview...

Qwen3-0.6B Overview

AvraamBunder/merged is a 0.8 billion parameter causal language model from the Qwen3 series, developed by Qwen, featuring a 32,768 token context length. A key differentiator is its unique ability to seamlessly switch between a "thinking mode" and a "non-thinking mode" within a single model. The thinking mode is designed for complex logical reasoning, mathematics, and coding, while the non-thinking mode handles efficient, general-purpose dialogue.

Key Capabilities

  • Enhanced Reasoning: Significantly improved performance in mathematics, code generation, and commonsense logical reasoning compared to previous Qwen models.
  • Human Preference Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, providing a more natural conversational experience.
  • Advanced Agent Capabilities: Offers strong tool-calling abilities, achieving leading performance among open-source models in complex agent-based tasks, especially when integrated with Qwen-Agent.
  • Multilingual Support: Supports over 100 languages and dialects with robust multilingual instruction following and translation capabilities.

Usage and Best Practices

The model allows explicit control over its thinking mode via an enable_thinking parameter or dynamic switching within user prompts using /think and /no_think tags. Optimal performance is achieved with specific sampling parameters: Temperature=0.6, TopP=0.95, TopK=20 for thinking mode, and Temperature=0.7, TopP=0.8, TopK=20 for non-thinking mode. It is recommended to use an output length of 32,768 tokens for most queries and up to 38,912 tokens for highly complex problems to ensure comprehensive responses.