Name: Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_parentheses_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 1.7 billion parameters and a 32,768 token context length. It leverages the Qwen3 foundation, known for its strong general language understanding.

Key Differentiator: GRPO Training

The primary distinction of this model is its training methodology. It was fine-tuned using GRPO (Gradient Regularized Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This training approach specifically aims to enhance the model's capabilities in mathematical reasoning and problem-solving.

Training Details

Base Model: Qwen/Qwen3-1.7B-Base
Training Framework: TRL (Transformers Reinforcement Learning)
Methodology: GRPO, focused on improving mathematical reasoning.

Potential Use Cases

Applications requiring enhanced logical deduction.
Tasks involving mathematical problem-solving or reasoning.
Scenarios where a smaller, specialized model for numerical or logical tasks is beneficial.

Overview

Model Overview

Key Differentiator: GRPO Training

Training Details

Potential Use Cases

Full Model Card (README)