unsloth/Qwen2.5-14B

Warm
Public
14.8B
FP8
32768
1
Sep 18, 2024
License: apache-2.0
Hugging Face

The unsloth/Qwen2.5-14B is a 14.7 billion parameter causal language model from the Qwen2.5 series, developed by Qwen. This base model features a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias, supporting a context length of 131,072 tokens. It offers significantly improved capabilities in coding, mathematics, instruction following, and long text generation, with multilingual support for over 29 languages. It is intended for further fine-tuning rather than direct conversational use.

Overview

Qwen2.5-14B Overview

This model is the 14.7 billion parameter base version of the Qwen2.5 series, developed by Qwen. It builds upon the Qwen2 architecture with significant enhancements across several key areas. The model is designed with a transformer architecture incorporating RoPE, SwiGLU, RMSNorm, and Attention QKV bias, and supports an extensive context length of 131,072 tokens.

Key Capabilities and Improvements

  • Enhanced Knowledge and Specialized Skills: Features significantly more knowledge and greatly improved capabilities in coding and mathematics, benefiting from specialized expert models.
  • Instruction Following and Text Generation: Demonstrates significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g., tables), and producing structured outputs like JSON.
  • Robustness: More resilient to diverse system prompts, enhancing role-play implementation and condition-setting for chatbots.
  • Long-Context Support: Supports context lengths up to 128K tokens and can generate up to 8K tokens.
  • Multilingual Support: Offers comprehensive multilingual capabilities across more than 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.

Intended Use

This unsloth/Qwen2.5-14B is a base model and is not recommended for direct conversational use. Instead, it is designed as a foundation for further post-training applications such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining. For more detailed information, refer to the Qwen2.5 blog and GitHub repository.