Qwen/Qwen2-1.5B-Instruct
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jun 3, 2024License:apache-2.0Architecture:Transformer0.2K Open Weights Warm

Qwen/Qwen2-1.5B-Instruct is a 1.5 billion parameter instruction-tuned causal language model developed by Qwen, part of the Qwen2 series. Built on a Transformer architecture with SwiGLU activation and group query attention, it features an improved tokenizer for multilingual and code adaptability. This model demonstrates strong performance across language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning benchmarks, making it suitable for a wide range of general-purpose conversational AI applications.

Loading preview...

Overview

Qwen2-1.5B-Instruct is an instruction-tuned model from the Qwen2 series, developed by Qwen. It is a 1.5 billion parameter decoder-only language model based on the Transformer architecture, incorporating features like SwiGLU activation, attention QKV bias, and group query attention. The model was pretrained on a large dataset and further post-trained using supervised finetuning and direct preference optimization, enhancing its conversational abilities.

Key Capabilities & Performance

This model demonstrates significant improvements over its predecessor, Qwen1.5-1.8B-Chat, particularly in a smaller parameter count. Benchmarks highlight its enhanced performance across various tasks:

  • MMLU: Achieves 52.4, a notable increase from Qwen1.5-1.8B-Chat's 43.7.
  • HumanEval: Scores 37.8, up from 25.0, indicating stronger code generation capabilities.
  • GSM8K: Reaches 61.6, substantially higher than 35.3, showing improved mathematical reasoning.
  • C-Eval: Posts 63.8, compared to 55.3, reflecting better performance on Chinese language evaluations.
  • IFEval (Prompt Strict-Acc.): Scores 29.0, an improvement from 16.8, suggesting better instruction following.

Architectural Enhancements

Qwen2 models feature an improved tokenizer designed for adaptability across multiple natural languages and programming codes, contributing to its strong multilingual and coding performance. The series aims to surpass many open-source models and compete with proprietary alternatives across diverse benchmarks.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p