FINAL-Bench/Darwin-9B-NEG
FINAL-Bench/Darwin-9B-NEG is a 9 billion parameter model built on a Qwen3.5-9B backbone, featuring Native Entropy Gating (NEG), a proprietary architectural innovation. This model embeds self-confidence directly into its weights, allowing for self-regulated reasoning without extra inference cost. It achieves 84.34% on the GPQA Diamond PhD-level reasoning benchmark, making it highly effective for graduate-level STEM reasoning and complex chain-of-thought tasks.
Loading preview...
Darwin-9B-NEG: The First Native Entropy Gating Model
Darwin-9B-NEG is a 9 billion parameter model from the FINAL-Bench Darwin series, distinguished by its Native Entropy Gating (NEG) technology. This proprietary Darwin V8 innovation integrates a self-confidence mechanism directly into the model's weights, enabling self-regulated reasoning within a single decoding loop. Unlike external multi-turn iteration techniques, NEG operates with 1x inference cost and activates in less than 5% of generation steps, significantly boosting reasoning accuracy.
Key Capabilities & Differentiators
- Native Entropy Gating (NEG): An architectural innovation that embeds self-confidence, predicting next-token distribution entropy and guiding token selection. This results in a +12.63% improvement on GPQA Diamond at identical inference cost.
- High Reasoning Accuracy: Achieves 84.34% on the GPQA Diamond PhD-level reasoning benchmark using a 3-stage ensemble protocol, surpassing the Qwen3.5-9B leaderboard score.
- Efficient Deployment: NEG modules are carried within the model weights, requiring no extra libraries or complex setup; standard
transformersloading is sufficient. - Darwin Series Lineage: Built on the Darwin-9B-Opus base, which is a Qwen3.5-family member produced by the Darwin V7 evolutionary breeding engine.
Recommended Use Cases
- Graduate-level STEM reasoning: Excels in physics, chemistry, biology, and mathematics (GPQA-style).
- Mathematical problem solving: Suitable for MATH and AIME-style challenges.
- Code reasoning and debugging: Performs well on HumanEval-style tasks.
- Complex chain-of-thought tasks: Ideal when a small reasoning model with a significant boost is required.