Qwen2.5-32B-Instruct Overview
This model is the instruction-tuned 32.5 billion parameter variant from the Qwen2.5 series, developed by Qwen. It builds upon the Qwen2 architecture, incorporating improvements across several key areas. The model utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
Key Capabilities and Improvements
- Enhanced Knowledge & Reasoning: Significantly improved performance in coding and mathematics, leveraging specialized expert models.
- Instruction Following: Demonstrates substantial advancements in adhering to instructions and generating long texts (up to 8K tokens).
- Structured Data Handling: Better at understanding structured data (e.g., tables) and generating structured outputs, particularly JSON.
- Robustness: More resilient to diverse system prompts, improving role-play and chatbot condition-setting.
- Long-Context Support: Features a full context length of 131,072 tokens, with generation capabilities up to 8,192 tokens. It can be configured with YaRN for handling even longer texts.
- Multilingual Support: Supports over 29 languages, including major global languages like Chinese, English, French, Spanish, German, Japanese, and Korean.
Technical Specifications
- Parameters: 32.5 billion (31.0 billion non-embedding)
- Layers: 64
- Attention Heads (GQA): 40 for Q, 8 for KV
Usage Notes
For processing long texts beyond 32,768 tokens, YaRN can be enabled via rope_scaling in the config.json. Deployment with vLLM is recommended, though vLLM's current static YaRN implementation may affect performance on shorter texts if enabled globally.