Model Overview
aloobun/d-Qwen1.5-0.5B is a 0.6 billion parameter language model, part of the Qwen1.5 series, developed by aloobun. This model is a result of a distillation experiment, where a Qwen1.5-1.8B model served as the teacher and Qwen1.5-0.5B as the student. The training utilized samples from the Pile dataset.
Key Performance Improvements
Despite its smaller size, this student model shows improved performance compared to its base Qwen1.5-0.5B counterpart on specific benchmarks:
- TruthfulQA: The student model achieved 39.29, surpassing the base model's 38.3.
- GSM8K: The student model scored 17.06, an improvement over the base model's 16.3.
Architectural Foundation
The Qwen1.5 series, on which this model is based, employs a Transformer architecture featuring SwiGLU activation, attention QKV bias, and group query attention. It also incorporates an improved tokenizer designed for multiple natural languages and code. This specific distilled version aims to retain strong performance in a more compact form factor.
Ideal Use Cases
This model is particularly well-suited for applications where:
- Resource efficiency is critical, given its 0.6B parameters.
- Tasks require improved factual accuracy (TruthfulQA) and mathematical problem-solving (GSM8K) within a smaller model footprint.
- Developers need a compact model that outperforms its direct base version on specific reasoning tasks.