cjc999/Qwen2.5-14B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Apr 25, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Qwen2.5-14B is a 14.7 billion parameter causal language model developed by Qwen, featuring a transformer architecture with RoPE, SwiGLU, and RMSNorm. This base model, part of the Qwen2.5 series, offers significantly improved knowledge, coding, and mathematics capabilities compared to its predecessor, Qwen2. It supports a long context length of up to 131,072 tokens and is designed for pretraining, with recommendations for further fine-tuning for conversational applications.

Loading preview...

Qwen2.5-14B Overview

Qwen2.5-14B is a 14.7 billion parameter base causal language model from the Qwen2.5 series, developed by the Qwen Team. It builds upon the Qwen2 architecture, incorporating improvements in several key areas.

Key Capabilities & Improvements

  • Enhanced Knowledge & Reasoning: Significantly improved capabilities in general knowledge, coding, and mathematics, leveraging specialized expert models.
  • Instruction Following: Offers substantial improvements in following instructions and generating long texts (over 8K tokens).
  • Structured Data & Output: Better understanding of structured data like tables and improved generation of structured outputs, including JSON.
  • Robustness: More resilient to diverse system prompts, enhancing role-play and chatbot condition-setting.
  • Long Context: Supports an extensive context length of up to 131,072 tokens.
  • Multilingual Support: Provides support for over 29 languages, including major global languages.

Model Architecture

This base model utilizes a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. It features 48 layers and 40 attention heads for Q (8 for KV).

Usage Recommendation

As a base language model, Qwen2.5-14B is primarily intended for pretraining. It is recommended to apply post-training techniques such as SFT (Supervised Fine-Tuning), RLHF (Reinforcement Learning from Human Feedback), or continued pretraining for conversational or instruction-following applications. For detailed evaluation results and further information, refer to the official Qwen2.5 blog and GitHub repository.