ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 25, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth is an 8 billion parameter Qwen3-based language model fine-tuned by ermiaazarkhalili. It is specifically optimized for reasoning distillation and chain-of-thought learning, leveraging the Unsloth framework for efficient training. This model excels at generating step-by-step reasoning processes, making it suitable for complex problem-solving tasks. It was trained on a 2,048 token context length using Claude's reasoning traces.

Loading preview...

Overview

This model, developed by ermiaazarkhalili, is a fine-tuned version of the 8 billion parameter Qwen3 base model, specifically optimized for reasoning tasks. It leverages the Unsloth framework for efficient training, achieving 2x faster training and 60% less VRAM usage compared to standard methods.

Key Capabilities

  • Reasoning Distillation: Fine-tuned on the claude-reasoning-distillation dataset, which includes 10,477 samples of Claude's reasoning traces with <think> blocks, enabling robust chain-of-thought capabilities.
  • Efficient Training: Utilizes Unsloth and QLoRA (4-bit) for resource-efficient fine-tuning, making it accessible for developers with limited hardware.
  • Optimized for Step-by-Step Problem Solving: Designed to generate detailed, step-by-step solutions, mimicking advanced reasoning processes.
  • Flexible Deployment: Available in various formats, including Hugging Face Transformers, Unsloth's optimized inference, 4-bit quantized inference, and GGUF versions for CPU/edge devices (Q4_K_M, Q5_K_M, Q8_0).

Good For

  • Applications requiring detailed, step-by-step reasoning and problem-solving.
  • Tasks benefiting from chain-of-thought prompting.
  • Developers seeking an efficiently trained model for reasoning with reduced VRAM requirements.
  • Use cases where a 2,048 token context window is sufficient for reasoning tasks.

Limitations

  • Primarily trained on English data.
  • Knowledge cutoff is limited to the base model's training data.
  • May exhibit hallucinations and is not extensively safety-tuned, requiring appropriate guardrails in deployment.