Name: Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p0_1p0_grpo_sapo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, developed by Kazuki1450, is a fine-tuned version of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters. It leverages the TRL (Transformers Reinforcement Learning) library for its training procedure.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology, which incorporates GRPO (Gradient-based Reward Policy Optimization). This technique was originally introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." The application of GRPO suggests an optimization towards enhanced reasoning capabilities, particularly in complex problem-solving scenarios.

Technical Details

Base Model: Qwen/Qwen3-1.7B-Base
Training Framework: TRL (Transformers Reinforcement Learning)
Core Training Method: GRPO, as detailed in the DeepSeekMath paper.

Potential Use Cases

Given its GRPO-based training, this model is likely well-suited for applications requiring:

Improved logical reasoning.
Tasks involving mathematical problem-solving or complex analytical thinking.
Scenarios where fine-grained policy optimization can lead to better performance.

Overview

Model Overview

Key Differentiator: GRPO Training

Technical Details

Potential Use Cases

Full Model Card (README)