FINAL-Bench/Darwin-28B-Coder
Darwin-28B-Coder is a 27 billion parameter, code-specialized language model from VIDRAFT FINAL-Bench, built on the Darwin family (Qwen3.5-compatible) architecture with a 32K token context length. It excels in function-level code generation, complex-library composition, and tool/function calling, achieving 100.0% on HumanEval and 72.0% on BigCodeBench-Complete. This model is optimized for high-performance code tasks, competing directly with frontier models like GPT-4o and Claude 3.5 Sonnet.
Loading preview...
Darwin-28B-Coder: Code-Specialized LLM
Darwin-28B-Coder is a 27 billion parameter model from VIDRAFT FINAL-Bench, designed for advanced code generation and function calling. It is built on the Darwin family architecture, compatible with Qwen3.5, and supports a 32K token context length. The model demonstrates strong performance across various code benchmarks, positioning it as a direct competitor to larger, frontier models.
Key Capabilities & Performance
- Exceptional Code Generation: Achieves 100.0% on HumanEval, surpassing GPT-4o (92.1%) and Claude 3.5 Sonnet (92.0%).
- Complex Library Composition: Scores 72.0% on BigCodeBench-Complete, significantly outperforming GPT-4o (50.1%) and Qwen2.5-Coder-32B (49.6%), indicating strong capabilities in multi-library code generation.
- Function Calling: Demonstrates high accuracy with 90.0% on the Function Calling (Simple) benchmark, comparable to Claude 3.7 Sonnet (~89%).
- MBPP Performance: Achieves 84.0% on MBPP, competitive with other leading code models.
- Training: Fine-tuned using parameter-efficient adapter merge on
m-a-p/CodeFeedback-Filtered-Instruction(Python, AST-validated) data.
Recommended Use Cases
- Function-level code generation: Ideal for generating specific functions or code snippets.
- Complex code composition: Suited for tasks requiring the integration of multiple libraries.
- Tool and function calling: Excellent for agentic workflows and interacting with external tools.
- High-correctness code: Recommended inference strategies include multi-sample with test-driven selection or ensemble voting for critical correctness.