Kazuki1450/Qwen3-1.7B-Base_csum_3_10_rel_1e-2_1p0_0p0_1p0_grpo_42_rule

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 18, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_csum_3_10_rel_1e-2_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. With a context length of 32768 tokens, it is optimized for tasks requiring robust mathematical problem-solving and logical deduction.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_3_10_rel_1e-2_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and supporting a substantial context length of 32768 tokens. It was developed by Kazuki1450 and trained using the TRL framework.

Key Differentiator: GRPO Training

A core aspect of this model is its training methodology. It leverages the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This specialized training approach aims to significantly improve the model's performance in mathematical reasoning tasks.

Capabilities

  • Enhanced Mathematical Reasoning: Optimized through GRPO for better performance on complex mathematical problems.
  • Causal Language Modeling: Inherits the base capabilities of the Qwen3-1.7B-Base model for text generation and understanding.
  • Extended Context Window: Supports a 32K token context, allowing for processing and generating longer sequences of text.

When to Use This Model

This model is particularly well-suited for applications where strong mathematical reasoning and problem-solving are critical. If your use case involves tasks that benefit from advanced logical deduction or numerical understanding, this GRPO-trained model offers a specialized alternative to general-purpose LLMs.