lllyx/Qwen3-1.7B-Base-OPD

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 3, 2026License:otherArchitecture:Transformer Cold

Qwen3-1.7B-Base-OPD is a 2 billion parameter Qwen3-based causal language model developed by lllyx, distilled using an on-policy distillation (OPD) method. Initialized from Qwen3-1.7B-Base and trained with Qwen3-4B-Base-GRPO as the teacher model on the DAPO-Math-17k dataset, it is specifically optimized for mathematical reasoning and problem-solving tasks. This model features a 32768 token context length and is designed to transfer advanced mathematical capabilities into a smaller, more efficient architecture.

Loading preview...

Model Overview

lllyx/Qwen3-1.7B-Base-OPD is a 2 billion parameter language model derived from the Qwen3-1.7B-Base architecture. It has undergone On-Policy Distillation (OPD), a specialized training process where it learns from a larger, more capable "teacher" model, lllyx/Qwen3-4B-Base-GRPO. This distillation process leverages the DAPO-Math-17k dataset to imbue the smaller student model with advanced mathematical reasoning and problem-solving abilities.

Key Characteristics

  • Base Model: Qwen3-1.7B-Base
  • Teacher Model: lllyx/Qwen3-4B-Base-GRPO
  • Training Method: On-Policy Distillation (OPD) with GRPO-style rollouts
  • Primary Domain: Mathematical reasoning and problem-solving
  • Context Length: 32768 tokens
  • Precision: bfloat16

Training Details

The model was trained using the verl framework, employing a policy-gradient term and a k1 distillation loss mode. The training involved 4 responses per prompt during rollouts, with a prompt length of 1024 and a response length of 7168. The process utilized a rule-based math reward function for optimization. This rigorous distillation aims to efficiently transfer complex mathematical understanding from the larger teacher model to the more compact 1.7B parameter student.

Intended Use

This model is specifically designed for applications requiring strong mathematical reasoning capabilities, making it suitable for tasks such as solving math problems, generating mathematical explanations, or assisting in quantitative analysis.