cs-552-2026-momy/general_knowledge_model

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 11, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Qwen3-1.7B is a 1.7 billion parameter causal language model from the Qwen series, developed by Qwen. It features a unique capability to seamlessly switch between a 'thinking mode' for complex logical reasoning, math, and coding, and a 'non-thinking mode' for efficient general-purpose dialogue. This model excels in reasoning, instruction-following, agent capabilities, and multilingual support across 100+ languages, with a context length of 32,768 tokens.

Loading preview...

Qwen3-1.7B Model Overview

Qwen3-1.7B is a 1.7 billion parameter causal language model from the Qwen series, designed for advanced reasoning and versatile applications. It introduces a novel feature allowing seamless switching between a 'thinking mode' for complex tasks like logical reasoning, mathematics, and code generation, and a 'non-thinking mode' for efficient general dialogue. This dual-mode functionality ensures optimized performance across diverse scenarios.

Key Capabilities

  • Dynamic Reasoning Modes: Uniquely supports explicit switching between a reasoning-focused 'thinking mode' and an efficiency-focused 'non-thinking mode' within a single model instance.
  • Enhanced Reasoning: Demonstrates significant improvements in mathematics, code generation, and commonsense logical reasoning, outperforming previous Qwen models in their respective modes.
  • Human Preference Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following, providing a more natural and engaging conversational experience.
  • Advanced Agent Capabilities: Offers strong tool-calling abilities, achieving leading performance among open-source models in complex agent-based tasks, supported by frameworks like Qwen-Agent.
  • Multilingual Support: Capable of handling over 100 languages and dialects, with robust multilingual instruction following and translation capabilities.

Best Practices for Usage

To achieve optimal performance, specific sampling parameters are recommended:

  • Thinking Mode: Use Temperature=0.6, TopP=0.95, TopK=20, and MinP=0. Avoid greedy decoding.
  • Non-Thinking Mode: Use Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.
  • Output Length: Recommend an output length of 32,768 tokens for most queries, extending to 38,912 for highly complex problems.
  • Output Standardization: Utilize specific prompts for math problems (e.g., "Please reason step by step, and put your final answer within \boxed{}") and multiple-choice questions to standardize responses.