Name: Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_rel_1e2_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_rel_1e2_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen/Qwen3-1.7B-Base architecture, featuring 1.7 billion parameters and a 32768-token context length. It was developed by Kazuki1450 and trained using the TRL framework.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It utilizes GRPO (Gradient Regularized Policy Optimization), a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This method is specifically designed to improve the mathematical reasoning abilities of large language models.

Intended Use Cases

Given its specialized training with GRPO, this model is particularly well-suited for:

Mathematical problem-solving: Tasks that require logical deduction and numerical accuracy.
Reasoning-intensive applications: Scenarios where robust analytical capabilities are crucial.
Research in mathematical AI: Exploring the effectiveness of GRPO in enhancing model performance on complex mathematical challenges.

Training Details

The model's training process is publicly logged and can be visualized via Weights & Biases. It was built using specific versions of key frameworks:

TRL: 0.29.0
Transformers: 4.57.6
Pytorch: 2.9.0
Datasets: 4.8.2
Tokenizers: 0.22.2

This fine-tuned model offers a focused approach to improving mathematical reasoning, making it a valuable tool for specific analytical and computational tasks.

Overview

Model Overview

Key Differentiator: GRPO Training

Intended Use Cases

Training Details

Full Model Card (README)