Qwen/Qwen2-1.5B-Instruct

Warm
Public
1.5B
BF16
131072
License: apache-2.0
Hugging Face
Overview

Overview

Qwen2-1.5B-Instruct is an instruction-tuned model from the Qwen2 series, developed by Qwen. It is a 1.5 billion parameter decoder-only language model based on the Transformer architecture, incorporating features like SwiGLU activation, attention QKV bias, and group query attention. The model was pretrained on a large dataset and further post-trained using supervised finetuning and direct preference optimization, enhancing its conversational abilities.

Key Capabilities & Performance

This model demonstrates significant improvements over its predecessor, Qwen1.5-1.8B-Chat, particularly in a smaller parameter count. Benchmarks highlight its enhanced performance across various tasks:

  • MMLU: Achieves 52.4, a notable increase from Qwen1.5-1.8B-Chat's 43.7.
  • HumanEval: Scores 37.8, up from 25.0, indicating stronger code generation capabilities.
  • GSM8K: Reaches 61.6, substantially higher than 35.3, showing improved mathematical reasoning.
  • C-Eval: Posts 63.8, compared to 55.3, reflecting better performance on Chinese language evaluations.
  • IFEval (Prompt Strict-Acc.): Scores 29.0, an improvement from 16.8, suggesting better instruction following.

Architectural Enhancements

Qwen2 models feature an improved tokenizer designed for adaptability across multiple natural languages and programming codes, contributing to its strong multilingual and coding performance. The series aims to surpass many open-source models and compete with proprietary alternatives across diverse benchmarks.