ermiaazarkhalili/Qwen3-4B-SFT-Claude-Opus-Reasoning-Unsloth

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Apr 25, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

ermiaazarkhalili/Qwen3-4B-SFT-Claude-Opus-Reasoning-Unsloth is a 4 billion parameter Qwen3-based language model fine-tuned by ermiaazarkhalili. It is optimized for reasoning distillation, specifically chain-of-thought learning, using the Claude Reasoning Distillation dataset. This model leverages Unsloth for efficient training, resulting in faster fine-tuning and reduced VRAM usage. It is designed to excel at tasks requiring step-by-step reasoning, with a fine-tuned context length of 2,048 tokens.

Loading preview...

Overview

This model, developed by ermiaazarkhalili, is a fine-tuned version of the Qwen3-4B base model, specifically optimized for reasoning distillation using chain-of-thought (CoT) learning. It was trained on the claude-reasoning-distillation dataset, which comprises 10,477 samples featuring Claude's reasoning traces with <think> blocks.

Key Capabilities

  • Enhanced Reasoning: Specialized in generating step-by-step reasoning processes, making it suitable for complex problem-solving tasks.
  • Efficient Training: Utilizes Unsloth for fine-tuning, achieving 2x faster training and 60% less VRAM consumption compared to standard methods.
  • Qwen3 Architecture: Built upon the Qwen3-4B (Unsloth 4-bit) base, providing a robust foundation for language understanding and generation.
  • QLoRA Fine-tuning: Employs 4-bit QLoRA for efficient adaptation, with a fine-tuned context window of 2,048 tokens.

Good For

  • Reasoning Tasks: Ideal for applications requiring explicit, step-by-step logical deduction, such as mathematical problems or complex queries.
  • Resource-Efficient Deployment: Its 4B parameter size and Unsloth optimization make it suitable for environments with limited computational resources.
  • Experimentation with CoT: Provides a strong foundation for further research and development in chain-of-thought reasoning models.