Qwen2.5-0.5B-Instruct: Enhanced Small-Scale LLM

This model is the instruction-tuned 0.5 billion parameter variant of the Qwen2.5 series, developed by the Qwen Team. It builds upon the Qwen2 architecture, featuring transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings. With 24 layers and 14 attention heads (GQA), it processes a full context length of 32,768 tokens and can generate up to 8,192 tokens.

Key Capabilities and Improvements

Enhanced Knowledge & Reasoning: Significantly improved capabilities in coding and mathematics, leveraging specialized expert models.
Instruction Following: Demonstrates substantial advancements in adhering to instructions and generating long, coherent texts (over 8K tokens).
Structured Data Handling: Excels at understanding structured data, including tables, and generating structured outputs like JSON.
Robust System Prompt Resilience: More resilient to diverse system prompts, enhancing role-play and chatbot condition-setting.
Multilingual Support: Offers comprehensive support for over 29 languages, including major global languages like Chinese, English, French, Spanish, German, and Japanese.

Use Cases

This model is well-suited for applications requiring efficient, small-scale language processing with strong instruction following and multilingual capabilities. Its improvements in coding, mathematics, and structured output generation make it valuable for tasks where precise, formatted responses are crucial, even within a constrained parameter budget.

Overview

Qwen2.5-0.5B-Instruct: Enhanced Small-Scale LLM

Key Capabilities and Improvements

Use Cases

Full Model Card (README)