EphAsad/Atem-1.7B
EphAsad/Atem-1.7B is a 1.7 billion parameter reasoning model, fine-tuned from Qwen3-1.7B using a single CoT-preserving SFT pass. It distills multi-domain reasoning capabilities from frontier teacher models, maintaining the base model's native thinking. Optimized for compute efficiency, it excels at multi-step mathematical, code, and analytical reasoning tasks on resource-constrained hardware.
Loading preview...
Atem-1.7B: A Compute-Efficient Reasoning Model
Atem-1.7B is a 1.7 billion parameter model developed by EphAsad, fine-tuned from Qwen3-1.7B. It utilizes a unique single CoT-preserving Supervised Fine-Tuning (SFT) pass to distill advanced multi-domain reasoning capabilities from larger teacher models, while preserving the base model's inherent intelligence. This approach avoids an 'erase-then-rebuild' pipeline, building reasoning directly on the base model's foundation.
Key Capabilities & Features
- Optimized Reasoning: Designed for multi-step mathematical reasoning, code explanation and debugging, analytical reasoning, commonsense reasoning, and logic evaluation.
- Compute Efficiency: The most compute-efficient model in the Atem series, completing training in under 2.5 hours on an A100-SXM4 80GB.
- CoT-Preserving SFT: Employs a single-pass design to integrate Chain-of-Thought reasoning without overwriting the base model's native thinking.
- GSM8K Format Restoration: Includes 5,000 GSM8K-format training examples to partially restore the
#### answerconvention, addressing a known formatting shift from the primary\boxed{}notation. - Full 16-bit LoRA: Utilizes full 16-bit LoRA for improved accuracy and speed, leveraging ample VRAM headroom on target hardware.
Performance Highlights
While benchmark deltas are generally modest at this scale, Atem-1.7B shows a statistically significant +1.9pp improvement on HellaSwag (2.8σ), indicating genuine commonsense reasoning transfer. It also shows a directional +3.0pp on OpenBookQA.
Ideal Use Cases
Atem-1.7B is particularly suited for reasoning tasks on resource-constrained hardware, such as edge devices or local deployments, where larger models (4B+) are impractical. It offers a strong balance of reasoning capability and efficiency for applications requiring analytical thought processes.