Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_2_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_2_rule, is a 2 billion parameter language model fine-tuned from the Qwen/Qwen3-1.7B-Base architecture. It leverages the Qwen3 base model and has been specifically trained using the GRPO (Gradient-based Reasoning Policy Optimization) method.

Key Capabilities

Enhanced Mathematical Reasoning: The primary differentiator of this model is its training with GRPO, a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This suggests a focus on improving the model's ability to handle complex mathematical problems and logical reasoning tasks.
Fine-tuned Performance: As a fine-tuned version, it aims to offer specialized performance beyond the base Qwen3-1.7B model, particularly in areas where GRPO's benefits are most pronounced.

Training Details

The model was trained using the TRL library (Transformer Reinforcement Learning) and incorporates the GRPO method. This training approach is designed to optimize the model's reasoning capabilities, making it a strong candidate for applications requiring robust mathematical and logical processing.

Use Cases

This model is particularly well-suited for:

Mathematical Problem Solving: Tasks involving arithmetic, algebra, calculus, and other mathematical domains.
Logical Reasoning: Applications that require deductive or inductive reasoning.
Research in Reasoning Models: As an example of a GRPO-trained model, it can be valuable for researchers exploring advanced reasoning techniques in LLMs.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)