Qwen2.5-Coder-0.5B Overview

This model is the 0.5 billion parameter variant of the Qwen2.5-Coder series, a family of code-specific large language models developed by Qwen. Building upon the Qwen2.5 architecture, this series significantly enhances capabilities in code generation, code reasoning, and code fixing. The training data has been scaled up to 5.5 trillion tokens, incorporating source code, text-code grounding, and synthetic data.

Key Capabilities

Enhanced Coding Abilities: Demonstrates significant improvements in generating, reasoning about, and fixing code.
Comprehensive Foundation: Designed to support real-world applications like Code Agents, while maintaining strengths in mathematics and general competencies.
Causal Language Model: Utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
Extended Context: Features a full context length of 32,768 tokens.

Intended Use

This 0.5B model is a pre-trained base language model. It is not recommended for direct conversational use. Instead, it serves as a strong foundation for further post-training, such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining, to adapt it for specific coding tasks or fill-in-the-middle applications. For more detailed evaluation results and information, refer to the Qwen2.5-Coder blog and GitHub repository.