lllyx/Qwen3-1.7B-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 22, 2026License:otherArchitecture:Transformer0.0K Warm

lllyx/Qwen3-1.7B-SFT is a 1.7 billion parameter supervised fine-tuned model based on Qwen3-1.7B-Base, developed by lllyx. It is specifically trained on the OpenThought3-Qwen3-4B dataset with a 20480-token context length, focusing on enhancing mathematical reasoning and problem-solving capabilities. This model is intended for use in research related to on-policy distillation of large language models, as detailed in the associated paper.

Loading preview...

Model Overview

lllyx/Qwen3-1.7B-SFT is a supervised fine-tuned (SFT) model derived from the Qwen3-1.7B-Base architecture. It has undergone full-parameter fine-tuning to specialize in mathematical reasoning and problem-solving, utilizing a substantial context length of 20480 tokens.

Key Characteristics

  • Base Model: Qwen3-1.7B-Base
  • Training Objective: Improved performance on math-focused instruction-following and reasoning tasks.
  • Context Length: 20480 tokens, enabling processing of longer mathematical problems.
  • Associated Research: Developed in conjunction with the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe" (arXiv:2604.13016).

Training Details

The model was fine-tuned using LLaMA-Factory with a full finetuning type and bf16 precision. The training dataset, OpenThought3-Qwen3-4B (lllyx/OpenThought3-Qwen3-4B), consists of math-domain prompts and Qwen3-4B (Non-thinking) generated answers. The dataset underwent cleaning processes including deduplication and removal of degenerate outputs to ensure high quality.