Overview
DeepCoder-1.5B-Preview: Code Reasoning LLM
DeepCoder-1.5B-Preview is a 1.5 billion parameter language model developed by agentica-org, specifically designed for code reasoning. It is fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B using advanced distributed reinforcement learning (RL) techniques.
Key Differentiators & Capabilities
- Reinforcement Learning (RL) for LLMs (RLLM): Leverages an improved GRPO+ algorithm, incorporating insights from DAPO, for more stable and effective training.
- Iterative Context Lengthening: Achieves a remarkable context length of 131072 tokens, generalizing well to long contexts due to techniques like DAPO's overlong filtering.
- Enhanced Code Performance: Significantly outperforms its base model,
DeepSeek-R1-Distill-Qwen-1.5B, across various coding benchmarks, including LiveCodeBench (LCBv5), Codeforces, and HumanEval+.- LCB (v5): 25.1 (vs. 16.9)
- HumanEval+: 73.0 (vs. 58.3)
- Open-Source & Accessible: Released under the MIT License, promoting open AI development and collaboration.
Training Innovations
The model's training recipe includes several enhancements to the GRPO algorithm:
- Offline Difficulty Filtering: Ensures a suitable difficulty range in the training dataset without runtime overhead.
- No Entropy or KL Loss: Eliminates instability issues often associated with these terms in RL training.
- Overlong Filtering & Clip High: Techniques adopted from DAPO to preserve long-context reasoning and encourage exploration.
Ideal Use Cases
- Code Generation: Excels at generating functional and complex code solutions.
- Code Problem Solving: Strong performance on competitive programming and coding challenge benchmarks.
- Long-Context Code Analysis: Capable of handling and reasoning over very long codebases or problem descriptions due to its extended context window.