Qwen/Qwen2.5-3B-Instruct

4.7 based on 3 reviews
Warm
Public
3.1B
BF16
32768
License: qwen-research
Hugging Face
Overview

Qwen2.5-3B-Instruct Overview

Qwen2.5-3B-Instruct is an instruction-tuned model from the latest Qwen2.5 series, developed by Qwen. This 3.09 billion parameter causal language model is built on a transformer architecture featuring RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings. It supports a substantial context length of 32,768 tokens and can generate up to 8,192 tokens, making it suitable for tasks requiring extensive input and output.

Key Capabilities

  • Enhanced Knowledge & Reasoning: Significantly improved capabilities in coding and mathematics, leveraging specialized expert models.
  • Advanced Instruction Following: Demonstrates strong performance in adhering to instructions and understanding diverse system prompts, beneficial for role-play and chatbot applications.
  • Structured Data & Output: Excels at understanding structured data (e.g., tables) and generating structured outputs, particularly JSON.
  • Long Text Generation: Capable of generating long texts exceeding 8,000 tokens.
  • Multilingual Support: Provides robust support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.

Good for

  • Applications requiring strong coding and mathematical reasoning at a smaller parameter count.
  • Chatbots and assistants needing resilient instruction following and role-play implementation.
  • Tasks involving structured data processing and JSON output generation.
  • Generating long-form content or handling extensive conversational turns.
  • Multilingual applications targeting a broad range of languages.