yipchifai/Qwen2.5-1.5B-Instruct
The yipchifai/Qwen2.5-1.5B-Instruct is a 1.54 billion parameter instruction-tuned causal language model from the Qwen2.5 series, developed by Qwen. It features a 32,768 token context length and is significantly improved in coding, mathematics, and instruction following. This model excels at generating long texts, understanding structured data like tables, and producing structured outputs such as JSON.
Loading preview...
Qwen2.5-1.5B-Instruct Overview
The yipchifai/Qwen2.5-1.5B-Instruct is an instruction-tuned variant of the Qwen2.5 series, a family of large language models developed by Qwen. This specific model has 1.54 billion parameters and supports a substantial context length of 32,768 tokens, with generation capabilities up to 8,192 tokens. It is built on a transformer architecture incorporating RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
Key Capabilities & Improvements
Qwen2.5 models, including this 1.5B instruction-tuned version, offer significant enhancements over previous Qwen2 iterations:
- Enhanced Knowledge & Reasoning: Demonstrates greatly improved capabilities in coding and mathematics, leveraging specialized expert models.
- Instruction Following: Shows significant improvements in adhering to instructions and generating diverse outputs.
- Long Text Generation: Excels at producing extended texts, capable of generating over 8,000 tokens.
- Structured Data Handling: Improved understanding of structured data formats, including tables, and generation of structured outputs like JSON.
- Robustness to System Prompts: More resilient to varied system prompts, enhancing its utility for role-play and chatbot condition-setting.
- Multilingual Support: Provides support for over 29 languages, including major global languages like Chinese, English, French, Spanish, German, and Japanese.
Architecture Details
This model features 28 layers and uses Grouped Query Attention (GQA) with 12 query heads and 2 key/value heads. The non-embedding parameter count is 1.31 billion.
Use Cases
Given its strengths, this model is well-suited for applications requiring:
- Instruction-based text generation.
- Code generation and mathematical problem-solving.
- Processing and generating structured data.
- Multilingual conversational AI and content creation.
- Tasks benefiting from long-context understanding and generation.