agentica-org/DeepCoder-14B-Preview

Warm
Public
14.8B
FP8
131072
License: mit
Hugging Face
Overview

DeepCoder-14B-Preview: Code Reasoning LLM

DeepCoder-14B-Preview, developed by Agentica, is a 14.8 billion parameter language model specifically designed for code reasoning. It is fine-tuned from the DeepSeek-R1-Distilled-Qwen-14B base model using an improved distributed reinforcement learning (RL) approach, referred to as GRPO+, and iterative context lengthening.

Key Capabilities & Performance

  • Exceptional Code Performance: Achieves 60.6% Pass@1 accuracy on LiveCodeBench v5 (8/1/24-2/1/25), an 8% improvement over its base model and comparable to OpenAI's o3-mini. It also scores 1936 on Codeforces Rating and 92.6 on HumanEval+.
  • Long Context Reasoning: Successfully generalizes to and performs well with long contexts, supporting up to 131072 tokens, despite being trained with a 32K context. This is facilitated by DAPO's overlong filtering technique.
  • Advanced Training Methodology: Incorporates GRPO+ enhancements, including offline difficulty filtering, removal of entropy and KL loss terms for stability, and 'Clip High' for better exploration.

Training Details

The model was trained on approximately 24,000 unique problem-test pairs from datasets like Taco-Verified, PrimeIntellect SYNTHETIC-1, and LiveCodeBench v5. Its iterative context lengthening process scaled training from 16K to 32K contexts, enabling strong generalization to 64K contexts.

Usage Recommendations

For optimal performance, users are advised to avoid system prompts, placing all instructions within the user prompt. Recommended inference parameters include temperature = 0.6, top_p = 0.95, and max_tokens set to at least 64000 to leverage its long-context capabilities.

License

DeepCoder-14B-Preview is released under the MIT License, promoting open and accessible AI development.