FINAL-Bench/Darwin-4B-Genesis

VISIONConcurrency Cost:1Model Size:7.9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 10, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Darwin-4B-Genesis is a 7.9 billion parameter language model developed by FINAL-Bench, representing the 3rd generation of the Darwin family. It is the world's first model to successfully crossbreed FFN layers from different architectures (Gemma4 Transformer and Qwen3.5 Mamba) using evolutionary optimization without additional training. This model demonstrates "Hybrid Vigor," outperforming both parent models on benchmarks like CLIcK (92%) and MuSR (70%), making it suitable for advanced reasoning tasks.

Loading preview...

Darwin-4B-Genesis: Evolutionary Cross-Architecture FFN Breeding

Darwin-4B-Genesis, developed by FINAL-Bench, is a 7.9 billion parameter model and the third generation in the Darwin family. It introduces a novel approach by being the world's first model to successfully crossbreed FFN layers from different architecturesβ€”specifically, Gemma4 Transformer and Qwen3.5 Mambaβ€”using evolutionary optimization. This process involves transplanting the mother's (Qwen3.5 Mamba) FFN knowledge at layer-specific optimal ratios discovered by CMA-ES, while preserving the father's (Gemma4 Transformer) Attention layers.

Key Capabilities & Innovations

  • Cross-Architecture FFN Breeding: Merges components from pre-trained Transformer and Mamba models without requiring additional training, a significant departure from existing hybrid models designed from scratch.
  • Demonstrated Hybrid Vigor: The resulting model, Darwin-4B-Genesis, outperforms both parent models on key benchmarks, achieving 92% on CLIcK (Korean culture) and 70% on MuSR (multi-step reasoning).
  • Evolutionary Optimization: Utilizes CMA-ES for a 42-dimensional genome search to determine optimal blending ratios for FFN layers, with aggressive Qwen blending observed in final layers governing output quality.

Why This Matters

This model represents a breakthrough in model merging, demonstrating that performance gains can be achieved by combining existing, distinct architectures through evolutionary methods. It offers a pathway for creating more capable models without the extensive computational cost of training from scratch.