agentica-org/DeepCoder-1.5B-Preview

Warm
Public
1.5B
BF16
131072
Apr 7, 2025
License: mit
Hugging Face
Overview

DeepCoder-1.5B-Preview: Code Reasoning LLM

DeepCoder-1.5B-Preview is a 1.5 billion parameter language model developed by agentica-org, specifically designed for code reasoning. It is fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B using advanced distributed reinforcement learning (RL) techniques.

Key Differentiators & Capabilities

  • Reinforcement Learning (RL) for LLMs (RLLM): Leverages an improved GRPO+ algorithm, incorporating insights from DAPO, for more stable and effective training.
  • Iterative Context Lengthening: Achieves a remarkable context length of 131072 tokens, generalizing well to long contexts due to techniques like DAPO's overlong filtering.
  • Enhanced Code Performance: Significantly outperforms its base model, DeepSeek-R1-Distill-Qwen-1.5B, across various coding benchmarks, including LiveCodeBench (LCBv5), Codeforces, and HumanEval+.
    • LCB (v5): 25.1 (vs. 16.9)
    • HumanEval+: 73.0 (vs. 58.3)
  • Open-Source & Accessible: Released under the MIT License, promoting open AI development and collaboration.

Training Innovations

The model's training recipe includes several enhancements to the GRPO algorithm:

  • Offline Difficulty Filtering: Ensures a suitable difficulty range in the training dataset without runtime overhead.
  • No Entropy or KL Loss: Eliminates instability issues often associated with these terms in RL training.
  • Overlong Filtering & Clip High: Techniques adopted from DAPO to preserve long-context reasoning and encourage exploration.

Ideal Use Cases

  • Code Generation: Excels at generating functional and complex code solutions.
  • Code Problem Solving: Strong performance on competitive programming and coding challenge benchmarks.
  • Long-Context Code Analysis: Capable of handling and reasoning over very long codebases or problem descriptions due to its extended context window.