ermiaazarkhalili/Qwen3-4B-SFT-Claude-Opus-Reasoning-Unsloth
ermiaazarkhalili/Qwen3-4B-SFT-Claude-Opus-Reasoning-Unsloth is a 4 billion parameter Qwen3-based language model fine-tuned by ermiaazarkhalili. It is optimized for reasoning distillation, specifically chain-of-thought learning, using the Claude Reasoning Distillation dataset. This model leverages Unsloth for efficient training, resulting in faster fine-tuning and reduced VRAM usage. It is designed to excel at tasks requiring step-by-step reasoning, with a fine-tuned context length of 2,048 tokens.
Loading preview...
Overview
This model, developed by ermiaazarkhalili, is a fine-tuned version of the Qwen3-4B base model, specifically optimized for reasoning distillation using chain-of-thought (CoT) learning. It was trained on the claude-reasoning-distillation dataset, which comprises 10,477 samples featuring Claude's reasoning traces with <think> blocks.
Key Capabilities
- Enhanced Reasoning: Specialized in generating step-by-step reasoning processes, making it suitable for complex problem-solving tasks.
- Efficient Training: Utilizes Unsloth for fine-tuning, achieving 2x faster training and 60% less VRAM consumption compared to standard methods.
- Qwen3 Architecture: Built upon the Qwen3-4B (Unsloth 4-bit) base, providing a robust foundation for language understanding and generation.
- QLoRA Fine-tuning: Employs 4-bit QLoRA for efficient adaptation, with a fine-tuned context window of 2,048 tokens.
Good For
- Reasoning Tasks: Ideal for applications requiring explicit, step-by-step logical deduction, such as mathematical problems or complex queries.
- Resource-Efficient Deployment: Its 4B parameter size and Unsloth optimization make it suitable for environments with limited computational resources.
- Experimentation with CoT: Provides a strong foundation for further research and development in chain-of-thought reasoning models.