hareeswar/Distilled-Qwen-1.5B-Coder
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 15, 2026License:apache-2.0Architecture:Transformer Open Weights Cold
The hareeswar/Distilled-Qwen-1.5B-Coder is a 1.5 billion parameter language model distilled from a 7B Qwen2.5-Coder teacher model. It is specifically fine-tuned for autonomous code generation, demonstrating a significant improvement in coding capabilities through a "think-before-acting" reasoning paradigm. This model excels at solving complex algorithmic problems and generating robust Python code.
Loading preview...
Distilled-Qwen-1.5B-Coder: Enhanced Code Generation
This model, hareeswar/Distilled-Qwen-1.5B-Coder, is a 1.5 billion parameter language model that has undergone a reasoning distillation process. It was fine-tuned using Chain-of-Thought (CoT) outputs from a larger 7 billion parameter Qwen2.5-Coder teacher model, specifically to improve its autonomous coding capabilities.
Key Capabilities & Performance
- Significant Improvement: Achieved a +15.3% absolute improvement in autonomous coding pass rates compared to its base 1.5B model.
- Reasoning Paradigm: The distillation process, involving the injection of
[REASONING]tokens during Supervised Fine-Tuning (SFT), successfully instilled a "think-before-acting" approach. - Algorithmic Problem Solving: Demonstrates near-perfect scores (95%+) on complex algorithmic edge cases and actively deconstructs problems before generating Python code.
- Performance Metrics: The distilled model reached an average pass rate of 79.8%, a substantial increase from the base model's 64.5%.
When to Use This Model
- Code Generation: Ideal for tasks requiring robust and logically sound Python code generation.
- Algorithmic Challenges: Particularly effective for solving complex algorithmic problems, including those involving dynamic programming and boundary checks.
- Resource-Constrained Environments: Offers enhanced coding performance in a smaller 1.5B parameter footprint, making it suitable for scenarios where larger models are impractical.