husheng12345/Qwen2.5-32B-Instruct
The husheng12345/Qwen2.5-32B-Instruct is a 32.5 billion parameter instruction-tuned causal language model developed by Qwen, featuring a transformer architecture with RoPE, SwiGLU, and RMSNorm. It offers significantly improved capabilities in coding, mathematics, and instruction following, alongside enhanced long text generation up to 8K tokens and structured output generation like JSON. This model supports a 131,072 token context length and is multilingual, covering over 29 languages, making it suitable for diverse complex language understanding and generation tasks.
Loading preview...
Qwen2.5-32B-Instruct Overview
This model is the instruction-tuned 32.5 billion parameter variant from the Qwen2.5 series, developed by Qwen. It builds upon the Qwen2 architecture, incorporating improvements across several key areas. The model utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
Key Capabilities and Improvements
- Enhanced Knowledge & Reasoning: Significantly improved performance in coding and mathematics, leveraging specialized expert models.
- Instruction Following: Demonstrates substantial advancements in adhering to instructions and generating long texts (up to 8K tokens).
- Structured Data Handling: Better at understanding structured data (e.g., tables) and generating structured outputs, particularly JSON.
- Robustness: More resilient to diverse system prompts, improving role-play and chatbot condition-setting.
- Long-Context Support: Features a full context length of 131,072 tokens, with generation capabilities up to 8,192 tokens. It can be configured with YaRN for handling even longer texts.
- Multilingual Support: Supports over 29 languages, including major global languages like Chinese, English, French, Spanish, German, Japanese, and Korean.
Technical Specifications
- Parameters: 32.5 billion (31.0 billion non-embedding)
- Layers: 64
- Attention Heads (GQA): 40 for Q, 8 for KV
Usage Notes
For processing long texts beyond 32,768 tokens, YaRN can be enabled via rope_scaling in the config.json. Deployment with vLLM is recommended, though vLLM's current static YaRN implementation may affect performance on shorter texts if enabled globally.