Michael-Kozu/Deimos-A4
Michael-Kozu/Deimos-A4 is a 4.7 billion parameter causal language model built on Qwen/Qwen3.5-4B, featuring a 32,768 token context length. It is specifically optimized for complex reasoning and hard mathematical tasks, utilizing a compressed, internal chain-of-thought mechanism. This model achieves significant reductions in token usage and faster inference times while demonstrating substantial accuracy gains on benchmarks like AIME and MATH-hard compared to its base model.
Loading preview...
Deimos-A4: A Reasoning Specialist
Deimos-A4 is a 4.7 billion parameter model, fine-tuned from Qwen/Qwen3.5-4B, designed to excel in complex reasoning and hard mathematical problems. It introduces an innovative internal "concise chain-of-thought" (CCoT) mechanism, where the model generates compact reasoning within <think>...</think> blocks before producing a clean, professional response. This process leads to approximately 60% fewer tokens and 36% faster inference compared to the base model, alongside a +40 point average accuracy increase on challenging math benchmarks.
Key Capabilities
- Compressed Reasoning: Emits terse, fragment-style reasoning internally, which is then expanded into a polished, user-facing output.
- Enhanced Mathematical Performance: Achieves significant accuracy gains on hard math tasks (AIME, MATH-hard, MATH-500) and multi-step proofs.
- Token Efficiency: Reduces output length by up to 77% on certain math problems, leading to faster processing.
- Configurable Thinking Mode: A runtime toggle (
enable_thinking) allows users to activate or deactivate the internal reasoning trace, optimizing performance for different task types.
Good For
- Hard Math & Logic: Ideal for AIME, MATH-hard, MATH-500, multi-step proofs, and long algebraic chains.
- Code Generation: Benefits from the reasoning capabilities for complex coding problems.
- Open-ended Reasoning: Tasks requiring deep logical deduction.
Limitations
- Knowledge Regression: May perform less effectively on general knowledge recall (MMLU) and strict instruction-following tasks compared to the base model, as its compression is not beneficial here.
- Internal Reasoning: The concise reasoning fragments are designed to remain internal; stripping the chat template or forcing generation outside the
<think>block may degrade results.