FINAL-Bench/Darwin-28B-Coder

VISIONConcurrency Cost:2Model Size:27BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:May 20, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Darwin-28B-Coder is a 27 billion parameter, code-specialized language model from VIDRAFT FINAL-Bench, built on the Darwin family (Qwen3.5-compatible) architecture with a 32K token context length. It excels in function-level code generation, complex-library composition, and tool/function calling, achieving 100.0% on HumanEval and 72.0% on BigCodeBench-Complete. This model is optimized for high-performance code tasks, competing directly with frontier models like GPT-4o and Claude 3.5 Sonnet.

Loading preview...

Darwin-28B-Coder: Code-Specialized LLM

Darwin-28B-Coder is a 27 billion parameter model from VIDRAFT FINAL-Bench, designed for advanced code generation and function calling. It is built on the Darwin family architecture, compatible with Qwen3.5, and supports a 32K token context length. The model demonstrates strong performance across various code benchmarks, positioning it as a direct competitor to larger, frontier models.

Key Capabilities & Performance

  • Exceptional Code Generation: Achieves 100.0% on HumanEval, surpassing GPT-4o (92.1%) and Claude 3.5 Sonnet (92.0%).
  • Complex Library Composition: Scores 72.0% on BigCodeBench-Complete, significantly outperforming GPT-4o (50.1%) and Qwen2.5-Coder-32B (49.6%), indicating strong capabilities in multi-library code generation.
  • Function Calling: Demonstrates high accuracy with 90.0% on the Function Calling (Simple) benchmark, comparable to Claude 3.7 Sonnet (~89%).
  • MBPP Performance: Achieves 84.0% on MBPP, competitive with other leading code models.
  • Training: Fine-tuned using parameter-efficient adapter merge on m-a-p/CodeFeedback-Filtered-Instruction (Python, AST-validated) data.

Recommended Use Cases

  • Function-level code generation: Ideal for generating specific functions or code snippets.
  • Complex code composition: Suited for tasks requiring the integration of multiple libraries.
  • Tool and function calling: Excellent for agentic workflows and interacting with external tools.
  • High-correctness code: Recommended inference strategies include multi-sample with test-driven selection or ensemble voting for critical correctness.