ermiaazarkhalili/Qwen3.5-4B-SFT-Claude-Opus-Reasoning-Unsloth

VISIONConcurrency Cost:1Model Size:4.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Apr 25, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The ermiaazarkhalili/Qwen3.5-4B-SFT-Claude-Opus-Reasoning-Unsloth model is a 4.5 billion parameter Qwen3.5-4B variant developed by ermiaazarkhalili. It is specifically fine-tuned for reasoning distillation, leveraging chain-of-thought learning from Claude's reasoning traces. Optimized using Unsloth for faster training and reduced VRAM, this model excels at tasks requiring step-by-step problem-solving and logical deduction.

Loading preview...

Model Overview

This model, developed by ermiaazarkhalili, is a fine-tuned version of the Qwen3.5-4B base model, specifically optimized for reasoning distillation using chain-of-thought (CoT) learning. It leverages the Unsloth framework, which enabled 2x faster training and 60% less VRAM consumption during its development.

Key Capabilities & Training Details

  • Reasoning Distillation: The model was trained on the claude-reasoning-distillation dataset, comprising 10,477 samples of Claude's reasoning traces with <think> blocks, enhancing its ability to perform step-by-step problem-solving.
  • Efficient Fine-tuning: Utilizes SFT with QLoRA (4-bit) and Unsloth's optimizations, making it efficient to train and deploy.
  • Base Model: Built upon the Qwen3.5-4B architecture, a 4 billion parameter model.
  • Context Length: Fine-tuned with a 2,048 token context window.
  • GGUF Availability: Quantized GGUF versions are provided for CPU and edge inference, supporting formats like Q4_K_M, Q5_K_M, and Q8_0.

Use Cases & Limitations

This model is particularly well-suited for applications requiring detailed, step-by-step reasoning, such as mathematical problem-solving, logical puzzles, or any task where explicit thought processes are beneficial. It is primarily trained on English data and has a knowledge cutoff limited to its base model's training data. Users should be aware of potential hallucinations and the need for safety guardrails, as it has not undergone extensive safety-tuning.