Overview
Qwen2-7B-Instruct is a 7.6 billion parameter instruction-tuned model from the Qwen2 series, developed by Qwen. It is built on the Transformer architecture, incorporating features like SwiGLU activation, attention QKV bias, and group query attention. The model has been pretrained on a large dataset and further refined with supervised finetuning and direct preference optimization.
Key Capabilities & Features
- Extended Context Window: Supports processing up to 131,072 tokens using YARN (Yet Another RoPE extention) for long text handling.
- Multilingual Support: Utilizes an improved tokenizer adaptive to multiple natural languages and code.
- Strong Benchmark Performance: Outperforms many similar-sized open-source models, including Qwen1.5-7B-Chat, across various benchmarks.
- Coding: Achieves 79.9 on HumanEval, 67.2 on MBPP, and 70.3 on Evalplus.
- Mathematics: Scores 82.3 on GSM8K and 49.6 on MATH.
- English & Chinese: Demonstrates competitive results on MMLU (70.5), MMLU-Pro (44.1), C-Eval (77.2), and AlignBench (7.21).
When to Use This Model
- General-Purpose Applications: Suitable for a wide range of tasks requiring strong language understanding and generation.
- Long Context Processing: Ideal for use cases that involve extensive inputs, such as document analysis or summarization, due to its 131K token context window.
- Coding & Mathematical Tasks: Recommended for applications requiring robust performance in code generation and complex mathematical problem-solving.
- Multilingual Scenarios: Effective for applications needing support across various languages.