Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-5_1p0_0p0_1p0_grpo_1_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-5_1p0_0p0_1p0_grpo_1_rule, is a fine-tuned variant of the Qwen/Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial 40960 token context window. Its development utilized the TRL framework for training.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It was fine-tuned using GRPO (Gradient Regularized Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This specialized training aims to significantly improve the model's proficiency in mathematical reasoning tasks.

Potential Use Cases

Mathematical Problem Solving: Ideal for applications requiring the model to understand and solve complex mathematical problems.
Logical Reasoning: Suitable for tasks that benefit from enhanced logical deduction capabilities.
Research and Development: Can serve as a base for further experimentation in improving mathematical and reasoning performance in smaller language models.

Overview

Model Overview

Key Differentiator: GRPO Training

Potential Use Cases

Full Model Card (README)