Name: Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_division_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and supporting a context length of 32768 tokens. Its training incorporated the GRPO (Gradient Regularized Policy Optimization) method, a technique introduced in the DeepSeekMath paper, which focuses on improving mathematical reasoning in language models.

Key Capabilities

Enhanced Mathematical Reasoning: Leverages the GRPO training method to improve performance on mathematical and logical tasks.
Base Model Foundation: Built upon the robust Qwen3-1.7B-Base, providing a strong general language understanding foundation.
Extended Context Window: Supports a substantial 32768 token context length, allowing for processing longer inputs and maintaining coherence over extended dialogues or documents.

Training Details

The model was trained using the TRL library (version 0.29.0) and other standard frameworks including Transformers (4.57.3) and Pytorch (2.9.0). The GRPO method, central to its fine-tuning, is detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

When to Use This Model

This model is particularly suitable for applications requiring strong mathematical problem-solving, logical deduction, or tasks where the GRPO method's benefits in reasoning are advantageous. Its large context window also makes it useful for processing and generating longer texts while maintaining contextual awareness.

Overview

Model Overview

Key Capabilities

Training Details

When to Use This Model

Full Model Card (README)