Qwen2.5-0.5B Model Summary

This repository hosts the base 0.5 billion parameter model from the Qwen2.5 series, developed by Qwen. Qwen2.5 represents an advancement over Qwen2, incorporating specialized expert models to significantly boost capabilities in coding and mathematics, and expanding its knowledge base. The model also shows marked improvements in instruction following, generating long texts (up to 8K tokens), and understanding/generating structured data, including JSON.

Key Capabilities & Features

Enhanced Core Abilities: Improved knowledge, coding, and mathematical reasoning.
Instruction Following: More robust instruction adherence and resilience to diverse system prompts, aiding role-play and chatbot condition-setting.
Structured Data Handling: Better understanding of tables and generation of structured outputs like JSON.
Long-Context Support: Features a full context length of 32,768 tokens.
Multilingual Support: Designed to support over 29 languages, including Chinese, English, French, Spanish, and more.
Architecture: Utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.

Intended Use

This 0.49B parameter model is a base language model intended for pretraining. It is not recommended for direct conversational use. Developers can apply further post-training techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining to adapt it for specific applications.