ermiaazarkhalili/Qwen3.5-9B-SFT-Claude-Opus-Reasoning-Unsloth
The ermiaazarkhalili/Qwen3.5-9B-SFT-Claude-Opus-Reasoning-Unsloth is a 9 billion parameter Qwen3.5-based language model fine-tuned by ermiaazarkhalili. It is specifically optimized for reasoning distillation and chain-of-thought learning, leveraging the Unsloth framework for efficient training. This model excels at tasks requiring step-by-step problem-solving and logical deduction, trained on Claude's reasoning traces. It features a 2048-token context window and is available in GGUF formats for diverse deployment.
Loading preview...
Model Overview
This model, ermiaazarkhalili/Qwen3.5-9B-SFT-Claude-Opus-Reasoning-Unsloth, is a 9 billion parameter Qwen3.5 variant developed by ermiaazarkhalili. It has been fine-tuned using the Unsloth framework, which enabled 2x faster training and 60% less VRAM consumption. The primary objective of this fine-tuning was to enhance the model's reasoning capabilities through distillation.
Key Capabilities & Training Details
- Reasoning Distillation: Optimized for chain-of-thought learning by training on the
claude-reasoning-distillationdataset, which includes 10,477 samples of Claude's reasoning traces with<think>blocks. - Efficient Fine-tuning: Utilizes Unsloth and QLoRA (4-bit) for efficient SFT, targeting modules like
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj, anddown_proj. - Context Length: Fine-tuned with a 2,048-token context window.
- GGUF Availability: Quantized GGUF versions (e.g.,
Q4_K_M,Q5_K_M,Q8_0) are provided for CPU and edge inference, compatible with tools like Ollama and llama.cpp.
Ideal Use Cases
- Complex Problem Solving: Suited for tasks that benefit from explicit, step-by-step reasoning.
- Educational Applications: Can be used for generating detailed explanations or solving logical puzzles.
- Research & Development: A strong base for further experimentation in reasoning-focused AI applications.
Limitations
- Primarily trained on English data.
- Knowledge cutoff is limited to the base model's training data.
- Not extensively safety-tuned, requiring external guardrails for sensitive applications.