Qwen2.5-1.5B: An Enhanced Base Language Model
Qwen2.5-1.5B is a 1.54 billion parameter causal language model, part of the latest Qwen2.5 series developed by the Qwen Team. This base model is built on a transformer architecture incorporating RoPE, SwiGLU, and RMSNorm, and supports a substantial context length of 32,768 tokens. It represents an advancement over Qwen2 models, focusing on enhanced capabilities across several key areas.
Key Capabilities & Improvements
- Knowledge & Reasoning: Significantly improved in general knowledge, coding, and mathematics, leveraging specialized expert models.
- Instruction Following: Demonstrates enhanced instruction following, generating long texts (over 8K tokens), and understanding/generating structured data like JSON.
- Robustness: More resilient to diverse system prompts, improving role-play and chatbot condition-setting.
- Long-Context Support: Capable of processing up to 128K tokens and generating up to 8K tokens.
- Multilingual Support: Provides support for over 29 languages, including major global languages.
Intended Use
This repository contains the base Qwen2.5-1.5B model, which is primarily intended for further fine-tuning. It is not recommended for direct conversational use without additional post-training steps such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining. Developers can leverage this robust base model for building specialized applications requiring strong foundational language understanding and generation capabilities.