Michael-Kozu/Deimos-A4

VISIONConcurrency Cost:1Model Size:4.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 3, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Michael-Kozu/Deimos-A4 is a 4.7 billion parameter causal language model built on Qwen/Qwen3.5-4B, featuring a 32,768 token context length. It is specifically optimized for complex reasoning and hard mathematical tasks, utilizing a compressed, internal chain-of-thought mechanism. This model achieves significant reductions in token usage and faster inference times while demonstrating substantial accuracy gains on benchmarks like AIME and MATH-hard compared to its base model.

Loading preview...

Deimos-A4: A Reasoning Specialist

Deimos-A4 is a 4.7 billion parameter model, fine-tuned from Qwen/Qwen3.5-4B, designed to excel in complex reasoning and hard mathematical problems. It introduces an innovative internal "concise chain-of-thought" (CCoT) mechanism, where the model generates compact reasoning within <think>...</think> blocks before producing a clean, professional response. This process leads to approximately 60% fewer tokens and 36% faster inference compared to the base model, alongside a +40 point average accuracy increase on challenging math benchmarks.

Key Capabilities

  • Compressed Reasoning: Emits terse, fragment-style reasoning internally, which is then expanded into a polished, user-facing output.
  • Enhanced Mathematical Performance: Achieves significant accuracy gains on hard math tasks (AIME, MATH-hard, MATH-500) and multi-step proofs.
  • Token Efficiency: Reduces output length by up to 77% on certain math problems, leading to faster processing.
  • Configurable Thinking Mode: A runtime toggle (enable_thinking) allows users to activate or deactivate the internal reasoning trace, optimizing performance for different task types.

Good For

  • Hard Math & Logic: Ideal for AIME, MATH-hard, MATH-500, multi-step proofs, and long algebraic chains.
  • Code Generation: Benefits from the reasoning capabilities for complex coding problems.
  • Open-ended Reasoning: Tasks requiring deep logical deduction.

Limitations

  • Knowledge Regression: May perform less effectively on general knowledge recall (MMLU) and strict instruction-following tasks compared to the base model, as its compression is not beneficial here.
  • Internal Reasoning: The concise reasoning fragments are designed to remain internal; stripping the chat template or forcing generation outside the <think> block may degrade results.