Qwen/Qwen2.5-7B

Warm
Public
7.6B
FP8
32768
1
Sep 15, 2024
License: apache-2.0
Hugging Face

Qwen/Qwen2.5-7B is a 7.61 billion parameter causal language model developed by Qwen, featuring a transformer architecture with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. This base model offers significantly improved knowledge, coding, and mathematics capabilities compared to its predecessor, Qwen2. It supports a long context length of 131,072 tokens and is designed for pretraining, serving as a foundation for further fine-tuning for specific applications.

Overview

Qwen2.5-7B Overview

Qwen2.5-7B is a 7.61 billion parameter base causal language model from the Qwen2.5 series, developed by Qwen. It builds upon the Qwen2 architecture, incorporating transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias. This model is specifically designed for pretraining and is intended as a foundation for subsequent fine-tuning, such as SFT or RLHF, rather than direct conversational use.

Key Capabilities & Improvements

  • Enhanced Knowledge & Reasoning: Significantly improved general knowledge, coding, and mathematics capabilities, benefiting from specialized expert models in these domains.
  • Instruction Following: Demonstrates substantial improvements in adhering to instructions and generating structured outputs, including JSON.
  • Long Text Generation: Better performance in generating extended texts, supporting outputs over 8K tokens.
  • Context Length: Features a robust context window of up to 131,072 tokens.
  • Multilingual Support: Offers support for over 29 languages, including major global languages like Chinese, English, French, Spanish, and Japanese.
  • System Prompt Resilience: More robust to diverse system prompts, aiding in role-play and condition-setting for chatbots.

Intended Use

This model is a base language model and is primarily intended for developers to perform post-training steps like supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), or continued pretraining. It is not recommended for direct conversational use without further fine-tuning.