Qwen/Qwen2-7B

Warm
Public
7.6B
FP8
131072
License: apache-2.0
Hugging Face
Overview

Qwen2-7B: A Powerful Base Language Model

Qwen2-7B is a 7.6 billion parameter base model from the Qwen2 series, developed by Qwen. This model is built on the Transformer architecture, incorporating features like SwiGLU activation, attention QKV bias, and group query attention. It also utilizes an improved tokenizer optimized for multiple natural languages and code.

Key Capabilities & Performance

Qwen2-7B has shown competitive performance against other state-of-the-art open-source models, including its predecessor Qwen1.5, across a wide array of benchmarks. Its strengths are particularly evident in:

  • English Tasks: Achieves 70.3 on MMLU, 40.0 on MMLU-Pro, and 62.6 on BBH, outperforming Mistral-7B, Gemma-7B, and Llama-3-8B in several key metrics.
  • Coding Tasks: Demonstrates significant improvements with 51.2 on HumanEval, 65.9 on MBPP, and 54.2 on EvalPlus, surpassing competitors.
  • Mathematics: Excels with 79.9 on GSM8K and 44.2 on MATH.
  • Multilingual Support: Shows strong results in Chinese tasks (83.2 on C-Eval, 83.9 on CMMLU) and other multilingual benchmarks like Multi-Exam (59.2) and Multi-Understanding (72.0).

Usage Recommendations

As a base language model, Qwen2-7B is primarily intended for further development. Users are advised to apply post-training techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), or continued pretraining to adapt it for specific text generation tasks.