Name: Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p8_0p0_1p0_grpo_dr_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p8_0p0_1p0_grpo_dr_grpo_42_rule, is a specialized fine-tuned version of the Qwen/Qwen3-1.7B-Base model. It leverages the TRL (Transformers Reinforcement Learning) framework for its training process.

Key Capabilities & Training

The primary differentiator of this model is its training methodology: it incorporates GRPO (Gradient-based Reward Policy Optimization). GRPO is a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a focus on improving the model's ability to handle complex mathematical reasoning tasks.

Technical Details

Base Model: Qwen3-1.7B-Base
Training Framework: TRL (version 0.29.0)
Core Training Method: GRPO, aimed at enhancing mathematical reasoning.

Use Cases

Given its fine-tuning with GRPO, this model is particularly well-suited for applications that require:

Mathematical problem-solving
Logical reasoning tasks
Scientific or engineering computations where robust numerical understanding is critical.

Developers can quickly integrate and test the model using the provided transformers pipeline example for text generation.

Overview

Overview

Key Capabilities & Training

Technical Details

Use Cases

Full Model Card (README)