Kazuki1450/Qwen3-1.7B-Base_csum_3_10_rel_1e0_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 18, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_3_10_rel_1e0_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model utilizes the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its capabilities. It is specifically optimized for tasks that benefit from advanced mathematical reasoning and robust problem-solving. With a 32768 token context length, it is suitable for applications requiring deep contextual understanding and precise logical inference.

Loading preview...

Overview

This model, developed by Kazuki1450, is a fine-tuned version of the Qwen3-1.7B-Base model, featuring approximately 2 billion parameters and a 32768 token context length. It was trained using the TRL framework and incorporates the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method. GRPO is a technique highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggesting an optimization towards enhanced mathematical and reasoning capabilities.

Key Capabilities

  • Mathematical Reasoning: Leverages the GRPO training method, indicating a focus on improving mathematical problem-solving and logical inference.
  • Base Model Foundation: Built upon the Qwen3-1.7B-Base architecture, providing a strong general language understanding foundation.
  • Extended Context Window: Supports a 32768 token context length, enabling processing of longer inputs and maintaining coherence over extended dialogues or documents.

Good For

  • Mathematical Tasks: Ideal for applications requiring robust mathematical reasoning, complex calculations, or logical deduction.
  • Research and Development: Suitable for researchers exploring advanced reinforcement learning techniques like GRPO in language models.
  • Context-Heavy Applications: Beneficial for use cases where understanding and generating text based on extensive context is crucial.