cjc999/Qwen2.5-14B
Qwen2.5-14B is a 14.7 billion parameter causal language model developed by Qwen, featuring a transformer architecture with RoPE, SwiGLU, and RMSNorm. This base model, part of the Qwen2.5 series, offers significantly improved knowledge, coding, and mathematics capabilities compared to its predecessor, Qwen2. It supports a long context length of up to 131,072 tokens and is designed for pretraining, with recommendations for further fine-tuning for conversational applications.
Loading preview...
Qwen2.5-14B Overview
Qwen2.5-14B is a 14.7 billion parameter base causal language model from the Qwen2.5 series, developed by the Qwen Team. It builds upon the Qwen2 architecture, incorporating improvements in several key areas.
Key Capabilities & Improvements
- Enhanced Knowledge & Reasoning: Significantly improved capabilities in general knowledge, coding, and mathematics, leveraging specialized expert models.
- Instruction Following: Offers substantial improvements in following instructions and generating long texts (over 8K tokens).
- Structured Data & Output: Better understanding of structured data like tables and improved generation of structured outputs, including JSON.
- Robustness: More resilient to diverse system prompts, enhancing role-play and chatbot condition-setting.
- Long Context: Supports an extensive context length of up to 131,072 tokens.
- Multilingual Support: Provides support for over 29 languages, including major global languages.
Model Architecture
This base model utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. It features 48 layers and 40 attention heads for Q (8 for KV).
Usage Recommendation
As a base language model, Qwen2.5-14B is primarily intended for pretraining. It is recommended to apply post-training techniques such as SFT (Supervised Fine-Tuning), RLHF (Reinforcement Learning from Human Feedback), or continued pretraining for conversational or instruction-following applications. For detailed evaluation results and further information, refer to the official Qwen2.5 blog and GitHub repository.