FINAL-Bench/Darwin-9B-NEG

VISIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 24, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

FINAL-Bench/Darwin-9B-NEG is a 9 billion parameter model built on a Qwen3.5-9B backbone, featuring Native Entropy Gating (NEG), a proprietary architectural innovation. This model embeds self-confidence directly into its weights, allowing for self-regulated reasoning without extra inference cost. It achieves 84.34% on the GPQA Diamond PhD-level reasoning benchmark, making it highly effective for graduate-level STEM reasoning and complex chain-of-thought tasks.

Loading preview...

Darwin-9B-NEG: The First Native Entropy Gating Model

Darwin-9B-NEG is a 9 billion parameter model from the FINAL-Bench Darwin series, distinguished by its Native Entropy Gating (NEG) technology. This proprietary Darwin V8 innovation integrates a self-confidence mechanism directly into the model's weights, enabling self-regulated reasoning within a single decoding loop. Unlike external multi-turn iteration techniques, NEG operates with 1x inference cost and activates in less than 5% of generation steps, significantly boosting reasoning accuracy.

Key Capabilities & Differentiators

  • Native Entropy Gating (NEG): An architectural innovation that embeds self-confidence, predicting next-token distribution entropy and guiding token selection. This results in a +12.63% improvement on GPQA Diamond at identical inference cost.
  • High Reasoning Accuracy: Achieves 84.34% on the GPQA Diamond PhD-level reasoning benchmark using a 3-stage ensemble protocol, surpassing the Qwen3.5-9B leaderboard score.
  • Efficient Deployment: NEG modules are carried within the model weights, requiring no extra libraries or complex setup; standard transformers loading is sufficient.
  • Darwin Series Lineage: Built on the Darwin-9B-Opus base, which is a Qwen3.5-family member produced by the Darwin V7 evolutionary breeding engine.

Recommended Use Cases

  • Graduate-level STEM reasoning: Excels in physics, chemistry, biology, and mathematics (GPQA-style).
  • Mathematical problem solving: Suitable for MATH and AIME-style challenges.
  • Code reasoning and debugging: Performs well on HumanEval-style tasks.
  • Complex chain-of-thought tasks: Ideal when a small reasoning model with a significant boost is required.