Name: Kazuki1450/Qwen3-1.7B-Base_csum_3_10_rel_1e-2_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_3_10_rel_1e-2_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and supporting a substantial context length of 32768 tokens. It was developed by Kazuki1450 and trained using the TRL framework.

Key Differentiator: GRPO Training

A core aspect of this model is its training methodology. It leverages the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This specialized training approach aims to significantly improve the model's performance in mathematical reasoning tasks.

Capabilities

Enhanced Mathematical Reasoning: Optimized through GRPO for better performance on complex mathematical problems.
Causal Language Modeling: Inherits the base capabilities of the Qwen3-1.7B-Base model for text generation and understanding.
Extended Context Window: Supports a 32K token context, allowing for processing and generating longer sequences of text.

When to Use This Model

This model is particularly well-suited for applications where strong mathematical reasoning and problem-solving are critical. If your use case involves tasks that benefit from advanced logical deduction or numerical understanding, this GRPO-trained model offers a specialized alternative to general-purpose LLMs.

Overview

Model Overview

Key Differentiator: GRPO Training

Capabilities

When to Use This Model

Full Model Card (README)