Qwen/Qwen2-72B

Warm
Public
72.7B
FP8
131072
License: other
Hugging Face
Overview

Qwen2-72B: A High-Performance Dense Language Model

Qwen2-72B is a 72.7 billion parameter base language model from the Qwen2 series, built upon the Transformer architecture. It incorporates architectural enhancements like SwiGLU activation, attention QKV bias, and group query attention, alongside an optimized tokenizer designed for diverse natural languages and programming codes.

Key Capabilities & Performance Highlights

This model has been rigorously evaluated across a broad spectrum of benchmarks, showcasing its advanced capabilities:

  • English Tasks: Achieves 84.2 on MMLU, 55.6 on MMLU-Pro, and 82.4 on BBH, demonstrating strong language understanding and reasoning.
  • Coding Tasks: Excels in programming, scoring 64.6 on HumanEval, 76.9 on MBPP, and 59.6 on MultiPL-E, indicating proficiency across multiple languages.
  • Mathematics: Performs robustly with 89.5 on GSM8K and 51.1 on MATH.
  • Multilingual Support: Shows high scores on C-Eval (91.0), CMMLU (90.1), and various Multi-Exam and Multi-Understanding benchmarks, highlighting its strong cross-lingual abilities.

Qwen2-72B generally outperforms previous Qwen1.5 models and many other state-of-the-art open-source models, positioning it as a competitive alternative to proprietary models in various domains. Users are advised to apply post-training methods like SFT or RLHF for specific text generation applications, as this is a base model.