Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled
Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled is a 4.5 billion parameter model developed by Jackrong, fine-tuned on the Qwen3.5-4B architecture. It specializes in structured reasoning and problem-solving, leveraging Chain-of-Thought (CoT) distillation from Claude-4.6 Opus interactions. The model excels at breaking down complex problems and delivering precise solutions, making it suitable for tasks requiring transparent, step-by-step logic.
Loading preview...
Overview
Jackrong/Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled is a 4.5 billion parameter language model built on the Qwen3.5-4B architecture, developed by Jackrong. This model is specifically fine-tuned for advanced reasoning capabilities, utilizing Chain-of-Thought (CoT) distillation from high-quality Claude-4.6 Opus interactions. It aims to provide structured, step-by-step problem-solving, particularly by enforcing an internal thinking process within <think> tags before generating a final answer.
Key Capabilities
- Structured Reasoning: Employs a streamlined reasoning paradigm, adopting an efficient "Let me analyze this request carefully: 1..2..3..." pattern to reduce redundant cognitive loops.
- CoT Distillation: Leverages Supervised Fine-Tuning (SFT) with response-only training, masking instructions to focus loss calculation purely on the generation of
<think>sequences and subsequent solutions. - Enhanced Reasoning Data: Further improved with additional reasoning data distilled from Qwen3.5-27B, including datasets like Jackrong/Qwen3.5-reasoning-700x, to strengthen structured problem-solving and reasoning diversity.
- Extended Context: Supports a 16,384 token context window, allowing for complex multi-step reasoning traces.
- Performance: Shows improved performance over baseline 4B models on benchmarks like GPQA Diamond (38.88%) and AI2 ARC-Challenge (66.38%).
Good For
- Offline analytical tasks requiring transparent, step-by-step logic.
- Coding and mathematical problem-solving.
- Heavy logic-dependent prompting where understanding the AI's internal thought process is crucial.
Limitations
- As an autoregressive LLM, it carries a risk of hallucination, especially when verifying real-world events within its thinking sequence.
- Intended primarily for academic research and technical exploration.