Kazuki1450/Qwen3-1.7B-Base_csum_3_10_1p0_0p1_1p0_grpo_42_rule

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 18, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_csum_3_10_1p0_0p1_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO (Gradient-based Reward Policy Optimization) method, which is known for enhancing mathematical reasoning capabilities in large language models. It is specifically optimized for tasks requiring improved reasoning, particularly in mathematical contexts, leveraging its 32K token context length.

Loading preview...

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_3_10_1p0_0p1_1p0_grpo_42_rule, is a specialized fine-tuned version of the Qwen3-1.7B-Base architecture. It incorporates the GRPO (Gradient-based Reward Policy Optimization) training method, as detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This approach aims to significantly improve the model's ability to handle complex reasoning tasks.

Key Capabilities

  • Enhanced Reasoning: The primary focus of this model's fine-tuning is to boost its reasoning capabilities, particularly in mathematical domains, through the application of the GRPO method.
  • Qwen3-1.7B-Base Foundation: Built upon the robust Qwen3-1.7B-Base model, it inherits a strong base for general language understanding and generation.
  • Extended Context Window: Features a 32,768 token context length, allowing it to process and generate longer, more coherent texts while maintaining context.

Good For

  • Mathematical Problem Solving: Ideal for applications requiring advanced mathematical reasoning and problem-solving, benefiting from the GRPO training.
  • Complex Logical Tasks: Suitable for scenarios where improved logical deduction and structured thinking are crucial.
  • Research and Development: Provides a foundation for further experimentation and fine-tuning on tasks that demand high-quality reasoning from a 2 billion parameter model.