Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_2_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_2_rule, is a fine-tuned variant of the Qwen/Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial context length of 40960 tokens. It was developed by Kazuki1450 and trained using the TRL framework.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It leverages GRPO (Gradient-based Reasoning Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This specialized training aims to significantly improve the model's proficiency in mathematical reasoning tasks.

Training Details

Base Model: Qwen/Qwen3-1.7B-Base
Training Framework: TRL (Transformer Reinforcement Learning)
Methodology: GRPO, focused on enhancing mathematical reasoning.

Use Cases

Given its GRPO-based training, this model is particularly well-suited for applications that demand:

Mathematical problem-solving
Logical reasoning tasks
Scientific computing assistance

Developers can integrate this model using the transformers library, as demonstrated in the quick start guide, to generate responses for complex questions, especially those with a mathematical or logical underpinning.

Overview

Overview

Key Differentiator: GRPO Training

Training Details

Use Cases

Full Model Card (README)