Qwen2.5-1.5B: An Enhanced Base Language Model
Qwen2.5-1.5B is a 1.54 billion parameter base causal language model, part of the latest Qwen2.5 series developed by the Qwen Team. This model builds upon Qwen2 with substantial improvements across several key areas, making it a robust foundation for various NLP tasks.
Key Capabilities & Enhancements
- Expanded Knowledge & Specialized Skills: Significantly improved performance in coding and mathematics, leveraging specialized expert models.
- Advanced Instruction Following: Enhanced ability to follow instructions, generate long texts (over 8K tokens), and understand/generate structured data like JSON.
- Robustness: More resilient to diverse system prompts, improving role-play and chatbot condition-setting.
- Long-Context Support: Supports a context length of up to 32,768 tokens, with the ability to generate up to 8K tokens.
- Multilingual Support: Offers comprehensive support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
Architecture & Features
This base model utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings. It features 28 layers and 12 attention heads (GQA) for Q and 2 for KV.
When to Use This Model
As a base language model, Qwen2.5-1.5B is not recommended for direct conversational use. Instead, it is ideal for developers looking to apply post-training techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining to create highly specialized and performant models for specific applications.