Name: lllyx/Qwen3-1.7B-Base-OPD API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: lllyx

Model Overview

lllyx/Qwen3-1.7B-Base-OPD is a 2 billion parameter language model derived from the Qwen3-1.7B-Base architecture. It has undergone On-Policy Distillation (OPD), a specialized training process where it learns from a larger, more capable "teacher" model, lllyx/Qwen3-4B-Base-GRPO. This distillation process leverages the DAPO-Math-17k dataset to imbue the smaller student model with advanced mathematical reasoning and problem-solving abilities.

Key Characteristics

Base Model: Qwen3-1.7B-Base
Teacher Model: lllyx/Qwen3-4B-Base-GRPO
Training Method: On-Policy Distillation (OPD) with GRPO-style rollouts
Primary Domain: Mathematical reasoning and problem-solving
Context Length: 32768 tokens
Precision: bfloat16

Training Details

The model was trained using the verl framework, employing a policy-gradient term and a k1 distillation loss mode. The training involved 4 responses per prompt during rollouts, with a prompt length of 1024 and a response length of 7168. The process utilized a rule-based math reward function for optimization. This rigorous distillation aims to efficiently transfer complex mathematical understanding from the larger teacher model to the more compact 1.7B parameter student.

Intended Use

This model is specifically designed for applications requiring strong mathematical reasoning capabilities, making it suitable for tasks such as solving math problems, generating mathematical explanations, or assisting in quantitative analysis.

Overview

Model Overview

Key Characteristics

Training Details

Intended Use

Full Model Card (README)