Qwen/Qwen2.5-14B

Warm
Public
14.8B
FP8
131072
License: apache-2.0
Hugging Face
Overview

Qwen2.5-14B Overview

Qwen2.5-14B is a 14.7 billion parameter base causal language model from the Qwen2.5 series, developed by Qwen. It builds upon the Qwen2 architecture, incorporating improvements in several key areas. The model features a substantial context length of 131,072 tokens, making it suitable for processing and generating extensive texts.

Key Capabilities & Improvements

  • Enhanced Knowledge & Specialized Skills: Significantly improved capabilities in coding and mathematics, leveraging specialized expert models.
  • Instruction Following: Demonstrates better adherence to instructions and is more resilient to diverse system prompts, aiding in role-play and chatbot implementations.
  • Long-Text Generation & Understanding: Improved performance in generating long texts (over 8K tokens) and understanding structured data like tables, including generating structured outputs such as JSON.
  • Multilingual Support: Offers support for over 29 languages, including major global languages like Chinese, English, French, Spanish, German, Japanese, and Korean.
  • Architecture: Utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias, comprising 48 layers.

Good For

  • Further Pretraining and Fine-tuning: As a base model, it is intended for subsequent post-training steps like Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining.
  • Applications Requiring Long Context: Its 128K token context window is beneficial for tasks demanding extensive input understanding or long-form content generation.
  • Multilingual Applications: Suitable for development in a wide array of languages due to its broad multilingual support.
  • Structured Data Processing: Improved ability to understand and generate structured data, making it useful for tasks involving tables or JSON outputs.