Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_10_1p0_0p0_1p0_grpo_2_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_10_1p0_0p0_1p0_grpo_2_rule, is a specialized fine-tuned version of the Qwen3-1.7B-Base model. It incorporates a 2 billion parameter architecture and supports an extensive context length of 40960 tokens, making it suitable for processing longer inputs.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It was fine-tuned using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This technique is specifically designed to improve a model's ability in mathematical reasoning and complex problem-solving.

Training Details

Base Model: Qwen/Qwen3-1.7B-Base
Training Framework: TRL (Transformer Reinforcement Learning) version 0.23.0
Core Method: GRPO, focused on enhancing mathematical reasoning.

Potential Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for applications requiring:

Mathematical problem-solving
Logical reasoning tasks
Scientific computing assistance
Educational tools for math and logic

Developers can quickly integrate this model using the provided transformers pipeline for text generation tasks, especially those benefiting from improved reasoning capabilities.

Overview

Model Overview

Key Differentiator: GRPO Training

Training Details

Potential Use Cases

Full Model Card (README)