FINAL-Bench/Darwin-31B-Opus

VISIONConcurrency Cost:2Model Size:31BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 6, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Darwin-31B-Opus is a 31 billion parameter reasoning-enhanced language model developed by VIDRAFT, built on the Gemma 4 Dense architecture with a 32768 token context length. It was created using the Darwin V6 diagnostic-guided evolutionary merge engine, combining google/gemma-4-31B-it with TeichAI/gemma-4-31B-it-Claude-Opus-Distill. This model excels in complex reasoning tasks, achieving 85.9% on the GPQA Diamond benchmark with its Darwin-DELPHI test-time engine, and supports over 140 languages.

Loading preview...

Overview

Darwin-31B-Opus is a 31 billion parameter model developed by VIDRAFT, leveraging the Gemma 4 Dense architecture with a 32768 token context window. It is distinguished by its unique creation process using the Darwin V6 engine, which performs a diagnostic-guided evolutionary merge of two parent models: google/gemma-4-31B-it and TeichAI/gemma-4-31B-it-Claude-Opus-Distill.

Key Capabilities & Features

  • Advanced Reasoning: Achieves 85.9% on the GPQA Diamond benchmark (a PhD-level science reasoning test) when paired with the Darwin-DELPHI test-time reasoning engine.
  • Diagnostic-Guided Merging: The Darwin V6 engine diagnoses parent models at the tensor level, assigning independent optimal ratios to 1,188 tensors, a significant departure from conventional uniform merging methods.
  • Multilingual Support: Capable of processing over 140 languages.
  • Optimized for Reasoning: The merge process specifically favored the 'Mother' model (Claude-Opus-Distill) for its strong reasoning concentration in later layers, while retaining the 'Father' model's (Gemma 4) attention structure for multimodal and long-context capabilities.
  • Chain-of-Thought: Supports enable_thinking=True for chain-of-thought reasoning.

When to Use This Model

  • For applications requiring high-level scientific or complex reasoning.
  • When seeking a model with strong performance on challenging benchmarks like GPQA Diamond.
  • For tasks benefiting from a long context window (32768 tokens) and multilingual capabilities.
  • Developers interested in advanced model merging techniques and their practical applications.