moos124/qwen-2.5-1.5B-instruct-SDFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 5, 2026Architecture:Transformer Warm

moos124/qwen-2.5-1.5B-instruct-SDFT is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model was trained using the Self-Training with On-Policy Self-Distillation (SDFT) method, which is designed for language model alignment. It is suitable for general instruction-following tasks, leveraging its specialized training approach to enhance performance.

Loading preview...

Model Overview

This model, moos124/qwen-2.5-1.5B-instruct-SDFT, is a 1.5 billion parameter instruction-tuned language model. It is built upon the base architecture of Qwen/Qwen2.5-1.5B-Instruct and has undergone further fine-tuning.

Key Differentiator: SDFT Training

The primary distinction of this model lies in its training methodology. It was fine-tuned using SDFT (Self-Training with On-Policy Self-Distillation), a method detailed in the paper "Self-Training with On-Policy Self-Distillation for Language Model Alignment" (arXiv:2601.19897). This technique aims to improve the alignment of language models through a self-training and self-distillation process.

Capabilities

  • Instruction Following: Designed to respond effectively to user instructions, leveraging its instruction-tuned base and SDFT fine-tuning.
  • General Purpose: Suitable for a variety of natural language processing tasks where instruction adherence is crucial.

Training Frameworks

The fine-tuning process utilized the TRL (Transformers Reinforcement Learning) library, with specific versions of key frameworks including TRL 1.3.0, Transformers 5.7.0, Pytorch 2.11.0, Datasets 4.8.5, and Tokenizers 0.22.2.