Jackrong/GPT-Distill-Qwen3-8B-Thinking
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

Jackrong/GPT-Distill-Qwen3-8B-Thinking is an 8 billion parameter instruction-tuned language model based on Qwen3-8B, featuring a 16K token context length. It is specifically designed for complex reasoning and instruction following, having been distilled from 120B+ parameter teacher models. A key differentiator is its "Thinking" capability, which generates explicit Chain-of-Thought processes within tags to enhance performance on math, logic, and scientific tasks. This model excels at mimicking high-intelligence reasoning patterns in a more efficient 8B architecture.

Loading preview...