Qwen2.5-0.5B Model Summary

This repository hosts the Qwen2.5-0.5B base model, a 0.49 billion parameter causal language model developed by the Qwen Team. It is part of the latest Qwen2.5 series, which introduces substantial enhancements over its predecessor, Qwen2. The model is built on a transformer architecture incorporating RoPE, SwiGLU, and RMSNorm, and supports a full context length of 32,768 tokens.

Key Capabilities and Improvements

Enhanced Knowledge & Specialized Skills: Significantly improved capabilities in coding and mathematics, leveraging specialized expert models.
Instruction Following: Demonstrates marked improvements in adhering to instructions and generating structured outputs, including JSON.
Long Text Generation: Better at producing extended texts, capable of generating up to 8,000 tokens.
Structured Data Understanding: Improved ability to understand structured data, such as tables.
Robustness: More resilient to diverse system prompts, enhancing role-play and chatbot condition-setting.
Multilingual Support: Offers support for over 29 languages, including major global languages like Chinese, English, French, Spanish, and Japanese.

Intended Use

As a base pretraining model, Qwen2.5-0.5B is not recommended for direct conversational use. Instead, it is designed as a robust foundation for further post-training applications such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining. Detailed evaluation results and further information are available in the official blog post and GitHub repository.

Overview

Qwen2.5-0.5B Model Summary

Key Capabilities and Improvements

Intended Use

Full Model Card (README)