Qwen2.5-3B-Instruct Overview

This model is the instruction-tuned 3.09 billion parameter variant from the Qwen2.5 series, developed by the Qwen Team. It builds upon previous Qwen models with substantial enhancements across several key areas, making it a versatile choice for various NLP tasks.

Key Capabilities & Improvements

Enhanced Knowledge & Reasoning: Significantly improved capabilities in coding and mathematics, leveraging specialized expert models.
Superior Instruction Following: Demonstrates marked improvements in adhering to instructions and is more resilient to diverse system prompts, aiding in role-play and chatbot implementations.
Long Context & Generation: Supports a full context length of 32,768 tokens and can generate outputs up to 8,192 tokens, ideal for complex and extended text tasks.
Structured Data Handling: Excels at understanding structured data, such as tables, and generating structured outputs, particularly JSON.
Multilingual Support: Offers robust support for over 29 languages, including major global languages like Chinese, English, French, Spanish, German, and Japanese.

Architecture & Training

The model is a causal language model based on the transformer architecture, incorporating features like RoPE, SwiGLU, RMSNorm, and attention QKV bias. It has 36 layers and 16 attention heads (with 2 for KV in GQA configuration). The model underwent both pre-training and post-training stages to achieve its enhanced performance.

Overview

Qwen2.5-3B-Instruct Overview

Key Capabilities & Improvements

Architecture & Training

Full Model Card (README)