Name: Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_rel_1e-2_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_rel_1e-2_1p0_0p0_1p0_grpo_42_rule, is a specialized fine-tuned version of the Qwen/Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a context length of 32768 tokens. It was developed by Kazuki1450 and trained using the Hugging Face TRL (Transformers Reinforcement Learning) framework.

Key Differentiator: GRPO Training

A core aspect of this model is its training methodology, which utilizes GRPO (Gradient Regularized Policy Optimization). This technique, introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is specifically designed to enhance a model's capabilities in complex mathematical reasoning tasks. This makes the model particularly adept at handling problems that require logical deduction and numerical precision.

Training Details

The model's training process leveraged specific versions of popular machine learning frameworks:

TRL: 0.29.0
Transformers: 4.57.6
PyTorch: 2.9.0
Datasets: 4.8.2
Tokenizers: 0.22.2

Recommended Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for:

Mathematical problem-solving: Excelling in tasks that demand strong mathematical reasoning.
Scientific computing: Assisting with calculations, formula derivation, and data interpretation.
Logical deduction: Applications requiring precise and structured reasoning.

Developers can quickly get started using the provided transformers pipeline example for text generation.

Overview

Model Overview

Key Differentiator: GRPO Training

Training Details

Recommended Use Cases

Full Model Card (README)