Qwen2.5-0.5B-Instruct Overview

This model is the instruction-tuned 0.5 billion parameter variant from the Qwen2.5 series, developed by Qwen. It builds upon the Qwen2 architecture with significant enhancements across several key areas. The model utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.

Key Capabilities & Improvements

Enhanced Knowledge & Reasoning: Incorporates significantly more knowledge, with greatly improved capabilities in coding and mathematics due to specialized expert models.
Instruction Following: Demonstrates significant improvements in adhering to instructions and is more resilient to diverse system prompts, enhancing role-play and chatbot condition-setting.
Long-Context & Generation: Supports a full context length of 32,768 tokens and can generate up to 8,192 tokens, making it suitable for long-form content.
Structured Data Handling: Excels at understanding structured data (e.g., tables) and generating structured outputs, particularly JSON.
Multilingual Support: Offers robust support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, and Vietnamese.

Model Specifications

Parameters: 0.49 billion (0.36 billion non-embedding)
Layers: 24
Attention Heads (GQA): 14 for Q, 2 for KV

This model is designed for developers seeking a compact yet powerful instruction-tuned LLM with strong multilingual, coding, and structured output generation capabilities.

Overview

Qwen2.5-0.5B-Instruct Overview

Key Capabilities & Improvements

Model Specifications

Full Model Card (README)