Name: moos124/qwen-2.5-1.5B-instruct-SDFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: moos124

Model Overview

This model, moos124/qwen-2.5-1.5B-instruct-SDFT, is a 1.5 billion parameter instruction-tuned language model. It is built upon the base architecture of Qwen/Qwen2.5-1.5B-Instruct and has undergone further fine-tuning.

Key Differentiator: SDFT Training

The primary distinction of this model lies in its training methodology. It was fine-tuned using SDFT (Self-Training with On-Policy Self-Distillation), a method detailed in the paper "Self-Training with On-Policy Self-Distillation for Language Model Alignment" (arXiv:2601.19897). This technique aims to improve the alignment of language models through a self-training and self-distillation process.

Capabilities

Instruction Following: Designed to respond effectively to user instructions, leveraging its instruction-tuned base and SDFT fine-tuning.
General Purpose: Suitable for a variety of natural language processing tasks where instruction adherence is crucial.

Training Frameworks

The fine-tuning process utilized the TRL (Transformers Reinforcement Learning) library, with specific versions of key frameworks including TRL 1.3.0, Transformers 5.7.0, Pytorch 2.11.0, Datasets 4.8.5, and Tokenizers 0.22.2.

Overview

Model Overview

Key Differentiator: SDFT Training

Capabilities

Training Frameworks

Full Model Card (README)