lllyx/Qwen3-1.7B-SFT
lllyx/Qwen3-1.7B-SFT is a 1.7 billion parameter supervised fine-tuned model based on Qwen3-1.7B-Base, developed by lllyx. It is specifically trained on the OpenThought3-Qwen3-4B dataset with a 20480-token context length, focusing on enhancing mathematical reasoning and problem-solving capabilities. This model is intended for use in research related to on-policy distillation of large language models, as detailed in the associated paper.
Loading preview...
Model Overview
lllyx/Qwen3-1.7B-SFT is a supervised fine-tuned (SFT) model derived from the Qwen3-1.7B-Base architecture. It has undergone full-parameter fine-tuning to specialize in mathematical reasoning and problem-solving, utilizing a substantial context length of 20480 tokens.
Key Characteristics
- Base Model: Qwen3-1.7B-Base
- Training Objective: Improved performance on math-focused instruction-following and reasoning tasks.
- Context Length: 20480 tokens, enabling processing of longer mathematical problems.
- Associated Research: Developed in conjunction with the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe" (arXiv:2604.13016).
Training Details
The model was fine-tuned using LLaMA-Factory with a full finetuning type and bf16 precision. The training dataset, OpenThought3-Qwen3-4B (lllyx/OpenThought3-Qwen3-4B), consists of math-domain prompts and Qwen3-4B (Non-thinking) generated answers. The dataset underwent cleaning processes including deduplication and removal of degenerate outputs to ensure high quality.