Sheelu1246/Qwen2.5-0.5B

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 30, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Sheelu1246/Qwen2.5-0.5B is a 0.49 billion parameter causal language model from the Qwen2.5 series, developed by Qwen. This base model features a transformer architecture with RoPE, SwiGLU, and RMSNorm, supporting a 32,768 token context length. It offers significant improvements in knowledge, coding, mathematics, and instruction following compared to its predecessor, Qwen2. Designed for pretraining, it serves as a foundation for further fine-tuning for specific applications.

Loading preview...

Qwen2.5-0.5B Overview

Sheelu1246/Qwen2.5-0.5B is a base causal language model from the Qwen2.5 series, featuring 0.49 billion parameters and a 32,768 token context length. Developed by the Qwen team, this model builds upon the Qwen2 architecture with notable enhancements across several key areas. It is designed as a pretraining foundation, intended for further post-training like SFT or RLHF rather than direct conversational use.

Key Capabilities & Improvements

  • Enhanced Knowledge & Reasoning: Significantly improved capabilities in coding and mathematics, leveraging specialized expert models.
  • Instruction Following: Substantial advancements in adhering to instructions and generating long texts (over 8K tokens).
  • Structured Data Handling: Better understanding of structured data, such as tables, and improved generation of structured outputs, particularly JSON.
  • System Prompt Resilience: More robust to diverse system prompts, which benefits role-play and chatbot condition-setting.
  • Multilingual Support: Offers support for over 29 languages, including major global languages like Chinese, English, French, Spanish, and Japanese.

Architecture & Training

This model utilizes a transformer architecture incorporating RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings. It consists of 24 layers and 14 attention heads (with 2 for KV in GQA). Detailed evaluation results and performance benchmarks are available in the official Qwen2.5 blog.

When to Use This Model

This 0.5B base model is ideal for developers looking for a compact, powerful foundation to build upon. It is particularly suited for tasks requiring custom fine-tuning for specific domains or applications where its enhanced coding, mathematical, and instruction-following capabilities can be leveraged. It is not recommended for direct conversational use without further fine-tuning.