cs-552-2026-middle-west/group_model

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 12, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The Qwen3-1.7B model, developed by Qwen, is a 1.7 billion parameter causal language model with a 32,768 token context length. It uniquely supports seamless switching between a 'thinking mode' for complex reasoning, math, and coding, and a 'non-thinking mode' for efficient general dialogue. This model excels in reasoning capabilities, human preference alignment for creative writing and role-playing, and agentic tasks with external tool integration, supporting over 100 languages.

Loading preview...

Qwen3-1.7B Model Overview

Qwen3-1.7B is a 1.7 billion parameter causal language model from the Qwen series, designed with a substantial 32,768 token context length. A key differentiator is its dynamic 'thinking' and 'non-thinking' modes, allowing the model to adapt its processing for different task complexities. The 'thinking mode' is optimized for intricate logical reasoning, mathematical problems, and code generation, while the 'non-thinking mode' handles general-purpose dialogue efficiently.

Key Capabilities

  • Enhanced Reasoning: Demonstrates significant improvements in mathematics, code generation, and commonsense logical reasoning, outperforming previous Qwen models.
  • Human Preference Alignment: Excels in creative writing, role-playing, and multi-turn dialogues, providing a more natural conversational experience.
  • Agentic Functionality: Offers strong capabilities for integrating with external tools, achieving leading performance in complex agent-based tasks among open-source models.
  • Multilingual Support: Supports over 100 languages and dialects, with robust multilingual instruction following and translation abilities.

Usage Recommendations

Developers can switch between thinking modes using the enable_thinking parameter in the tokenizer or dynamically within prompts using /think and /no_think tags. Optimal sampling parameters are provided for each mode to prevent issues like endless repetitions. The model also supports deployment via sglang and vllm for OpenAI-compatible API endpoints and is compatible with local applications like Ollama and LMStudio.