tiansenwang/Qwen2.5-0.5B

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 23, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The Qwen2.5-0.5B model by Qwen is a 0.49 billion parameter causal language model, part of the Qwen2.5 series, designed for pretraining. It features a transformer architecture with RoPE, SwiGLU, and RMSNorm, supporting a 32,768-token context length. This base model is optimized for significant improvements in knowledge, coding, mathematics, instruction following, and long text generation, serving as a foundation for further fine-tuning.

Loading preview...

Qwen2.5-0.5B Model Summary

This repository hosts the Qwen2.5-0.5B base model, a 0.49 billion parameter causal language model developed by the Qwen Team. It is part of the latest Qwen2.5 series, which introduces substantial enhancements over its predecessor, Qwen2. The model is built on a transformer architecture incorporating RoPE, SwiGLU, and RMSNorm, and supports a full context length of 32,768 tokens.

Key Capabilities and Improvements

  • Enhanced Knowledge & Specialized Skills: Significantly improved capabilities in coding and mathematics, leveraging specialized expert models.
  • Instruction Following: Demonstrates marked improvements in adhering to instructions and generating structured outputs, including JSON.
  • Long Text Generation: Better at producing extended texts, capable of generating up to 8,000 tokens.
  • Structured Data Understanding: Improved ability to understand structured data, such as tables.
  • Robustness: More resilient to diverse system prompts, enhancing role-play and chatbot condition-setting.
  • Multilingual Support: Offers support for over 29 languages, including major global languages like Chinese, English, French, Spanish, and Japanese.

Intended Use

As a base pretraining model, Qwen2.5-0.5B is not recommended for direct conversational use. Instead, it is designed as a robust foundation for further post-training applications such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining. Detailed evaluation results and further information are available in the official blog post and GitHub repository.