mitchcross895/Qwen2.5-7B-Instruct
Qwen2.5-7B-Instruct is a 7.61 billion parameter instruction-tuned causal language model developed by Qwen, part of the Qwen2.5 series. This model significantly improves upon its predecessor with enhanced knowledge, coding, and mathematical capabilities, leveraging specialized expert models. It excels in instruction following, long text generation up to 8K tokens, structured data understanding, and JSON output, supporting a 131,072-token context length. The model also offers robust multilingual support for over 29 languages.
Loading preview...
Qwen2.5-7B-Instruct Overview
Qwen2.5-7B-Instruct is an instruction-tuned causal language model from the Qwen2.5 series, developed by Qwen. This 7.61 billion parameter model builds upon Qwen2 with substantial improvements across several key areas.
Key Capabilities & Improvements
- Enhanced Knowledge & Reasoning: Significantly improved capabilities in coding and mathematics, benefiting from specialized expert models.
- Instruction Following & Output Quality: Demonstrates better instruction following, improved generation of long texts (over 8K tokens), and enhanced understanding of structured data like tables. It also excels at generating structured outputs, particularly JSON.
- Robustness: More resilient to diverse system prompts, which improves role-play implementation and chatbot condition-setting.
- Extended Context Length: Supports a full context length of 131,072 tokens, with generation capabilities up to 8,192 tokens. YaRN (Yet another RoPE extension) is utilized for handling long texts, though static YaRN in vLLM may impact performance on shorter texts.
- Multilingual Support: Offers comprehensive support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, and Arabic.
Architecture & Features
This model is built on a transformer architecture, incorporating RoPE, SwiGLU, RMSNorm, and Attention QKV bias. It features 28 layers and a GQA configuration with 28 attention heads for Q and 4 for KV. The model is designed for pretraining and post-training stages, focusing on delivering high performance in instruction-following tasks.