Overview
Qwen2.5-Coder-3B Overview
Qwen2.5-Coder-3B is a 3.09 billion parameter model from the Qwen2.5-Coder family, a series of code-specific large language models developed by Qwen. This model builds upon the strong Qwen2.5 foundation, with significant improvements in coding capabilities through extensive pretraining on 5.5 trillion tokens, including source code, text-code grounding, and synthetic data.
Key Capabilities
- Enhanced Code Performance: Demonstrates significant advancements in code generation, code reasoning, and code fixing.
- Comprehensive Foundation: Designed to support real-world applications like Code Agents, while also maintaining strong performance in mathematics and general language understanding.
- Technical Specifications: Features a 3.09 billion parameter transformer architecture with a substantial 32,768-token context length, RoPE, SwiGLU, and RMSNorm.
Intended Use
This base model is primarily intended for further post-training, such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining, to adapt it for specific conversational or fill-in-the-middle tasks. It is not recommended for direct conversational use without further fine-tuning.