EphAsad/Atem-0.6B
Atem-0.6B by EphAsad is a 0.6 billion parameter reasoning model built on Qwen/Qwen3-0.6B, fine-tuned via multi-source knowledge distillation. It specializes in providing concise, directly-formatted answers for tasks like code explanation, mathematical problem-solving, and analytical reasoning. This model is optimized for lightweight, open-ended reasoning where speed and low compute cost are prioritized over deep, multi-step chain-of-thought processes.
Loading preview...
Overview
Atem-0.6B is a 0.6 billion parameter reasoning model developed by EphAsad, fine-tuned from Qwen/Qwen3-0.6B using LoRA. It was trained on approximately 120,000 distilled examples from multiple frontier teacher models, with a focus on producing clean, directly-formatted final answers by suppressing explicit chain-of-thought traces. This model represents Stage 1 of a multi-stage training series, laying a foundation for more complex reasoning capabilities in future iterations.
Key Capabilities
- Direct Reasoning: Provides concise and structured answers for various analytical tasks.
- Code Assistance: Excels in code explanation, implementation, and debugging.
- Mathematical Problem Solving: Capable of solving mathematical problems with working shown, demonstrating a notable gain on GSM8K benchmarks due to its direct answer formatting.
- Analytical Tasks: Suitable for analytical reasoning, hypothesis evaluation, and concept explanation.
Intended Use Cases
Atem-0.6B is designed for scenarios requiring efficient, low-compute reasoning, where direct and structured outputs are beneficial. It is particularly well-suited for:
- Lightweight, open-ended reasoning tasks.
- Applications where speed and a small footprint are more critical than deep, multi-step reasoning on complex problems.
- Tasks that benefit from suppressed thinking traces, leading to more direct responses.
Limitations
As a Stage 1 model, Atem-0.6B deliberately suppresses thinking traces, which can lead to reduced accuracy on multi-step problems where the base model's exposed reasoning might self-correct. Its 0.6B parameter count also means a smaller capability ceiling compared to larger models, and it may exhibit mathematical precision issues on complex calculations without a scratchpad.