Qwen2.5-Coder-1.5B Overview
Qwen2.5-Coder-1.5B is a 1.54 billion parameter pre-trained causal language model, part of the Qwen2.5-Coder series developed by Qwen. This model is a successor to CodeQwen1.5, bringing significant enhancements in coding capabilities. It is built on the robust Qwen2.5 architecture, featuring transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
Key Capabilities & Features
- Enhanced Code Performance: Offers significant improvements in code generation, code reasoning, and code fixing.
- Extensive Training: Trained on 5.5 trillion tokens, including a large proportion of source code, text-code grounding, and synthetic data.
- Long Context Support: Supports a full context length of 131,072 tokens, utilizing YaRN for length extrapolation, though the default
config.json is set for 32,768 tokens. - Foundation for Code Agents: Designed to provide a comprehensive base for real-world applications such as Code Agents, while maintaining strengths in mathematics and general competencies.
- Architecture: Comprises 28 layers and 12 attention heads (GQA) for Q, with 2 for KV.
Good For
- Code-Specific Tasks: Ideal for tasks requiring advanced code generation, reasoning, and fixing.
- Further Fine-tuning: Recommended as a base language model for post-training applications like SFT, RLHF, or continued pretraining, rather than direct conversational use.
- Long Code Contexts: Suitable for applications requiring processing and understanding of very long code inputs, especially when configured with YaRN for extended context handling.