noenoenoe123/Qwen2.5-0.5B-Instruct
The noenoenoe123/Qwen2.5-0.5B-Instruct is a 0.49 billion parameter instruction-tuned causal language model from the Qwen2.5 series, developed by Qwen. It features a 32,768 token context length and is enhanced with significantly more knowledge, improved coding and mathematics capabilities, and better instruction following. This model excels at generating long texts, understanding structured data like JSON, and offers multilingual support for over 29 languages.
Loading preview...
Qwen2.5-0.5B-Instruct Overview
This model is the instruction-tuned 0.5 billion parameter variant from the Qwen2.5 series, developed by Qwen. It builds upon the Qwen2 architecture with significant enhancements across several key areas. The model utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
Key Capabilities & Improvements
- Enhanced Knowledge & Reasoning: Incorporates significantly more knowledge, with greatly improved capabilities in coding and mathematics due to specialized expert models.
- Instruction Following: Demonstrates significant improvements in adhering to instructions and is more resilient to diverse system prompts, enhancing role-play and chatbot condition-setting.
- Long-Context & Generation: Supports a full context length of 32,768 tokens and can generate up to 8,192 tokens, making it suitable for long-form content.
- Structured Data Handling: Excels at understanding structured data (e.g., tables) and generating structured outputs, particularly JSON.
- Multilingual Support: Offers robust support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, and Vietnamese.
Model Specifications
- Parameters: 0.49 billion (0.36 billion non-embedding)
- Layers: 24
- Attention Heads (GQA): 14 for Q, 2 for KV
This model is designed for developers seeking a compact yet powerful instruction-tuned LLM with strong multilingual, coding, and structured output generation capabilities.