Qwen2.5-32B-Instruct Overview

This model is the instruction-tuned 32.5 billion parameter variant from the Qwen2.5 series, developed by Qwen. It builds upon the Qwen2 architecture, incorporating improvements across several key areas. The model utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.

Key Capabilities and Improvements

Enhanced Knowledge & Reasoning: Significantly improved performance in coding and mathematics, leveraging specialized expert models.
Instruction Following: Demonstrates substantial advancements in adhering to instructions and generating long texts (up to 8K tokens).
Structured Data Handling: Better at understanding structured data (e.g., tables) and generating structured outputs, particularly JSON.
Robustness: More resilient to diverse system prompts, improving role-play and chatbot condition-setting.
Long-Context Support: Features a full context length of 131,072 tokens, with generation capabilities up to 8,192 tokens. It can be configured with YaRN for handling even longer texts.
Multilingual Support: Supports over 29 languages, including major global languages like Chinese, English, French, Spanish, German, Japanese, and Korean.

Technical Specifications

Parameters: 32.5 billion (31.0 billion non-embedding)
Layers: 64
Attention Heads (GQA): 40 for Q, 8 for KV

Usage Notes

For processing long texts beyond 32,768 tokens, YaRN can be enabled via rope_scaling in the config.json. Deployment with vLLM is recommended, though vLLM's current static YaRN implementation may affect performance on shorter texts if enabled globally.

Overview

Qwen2.5-32B-Instruct Overview

Key Capabilities and Improvements

Technical Specifications

Usage Notes

Full Model Card (README)