painslane/Qwen2-0.5B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 1, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The painslane/Qwen2-0.5B-Instruct is a 0.5 billion parameter instruction-tuned causal language model from the Qwen2 series, developed by Qwen. Based on the Transformer architecture, it features SwiGLU activation, attention QKV bias, and group query attention, with an improved tokenizer for multilingual and code support. This model is optimized for a broad range of tasks including language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning, demonstrating competitive performance against other open-source models.

Loading preview...

Qwen2-0.5B-Instruct Overview

This model is the instruction-tuned 0.5 billion parameter variant from the new Qwen2 series of large language models, developed by Qwen. Qwen2 models are built on the Transformer architecture, incorporating features like SwiGLU activation, attention QKV bias, and group query attention, alongside an enhanced tokenizer designed for multiple natural languages and code.

Key Capabilities & Performance

Qwen2 models, including this 0.5B instruction-tuned version, have shown strong performance across various benchmarks, often surpassing previous Qwen1.5 models and other open-source alternatives. It is designed for a wide array of tasks, including:

  • Language Understanding and Generation
  • Multilingual Capabilities
  • Coding and Mathematics
  • Reasoning

Comparative evaluation against Qwen1.5-0.5B-Chat highlights significant improvements:

  • MMLU: 37.9 (vs 35.0)
  • HumanEval: 17.1 (vs 9.1)
  • GSM8K: 40.1 (vs 11.3)
  • C-Eval: 45.2 (vs 37.2)
  • IFEval (Prompt Strict-Acc.): 20.0 (vs 14.6)

Training Details

The model was pretrained on a large dataset and further refined using both supervised finetuning and direct preference optimization to enhance its instruction-following abilities.

Use Cases

Given its broad capabilities and improved performance in a compact 0.5B size, this model is suitable for applications requiring efficient language processing, coding assistance, and reasoning, especially where resource constraints are a factor.