Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_After_1p0_0p0_1p0_grpo_42_rule

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jan 13, 2026Architecture:Transformer Cold

Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_After_1p0_0p0_1p0_grpo_42_rule is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities in open language models. It is optimized for tasks requiring improved logical and mathematical problem-solving, making it suitable for applications where precise reasoning is crucial. The model leverages a 131072 token context length for processing extensive inputs.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_After_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned version of the Qwen/Qwen2.5-1.5B-Instruct base model, featuring 1.5 billion parameters and a substantial 131072 token context length. It was developed by Kazuki1450 and trained using the TRL framework.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It was fine-tuned using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (Shao et al., 2024). This technique is specifically designed to improve the model's ability to handle complex mathematical and logical reasoning tasks.

Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for:

  • Mathematical problem-solving: Excelling in tasks that require step-by-step logical deduction and numerical accuracy.
  • Reasoning-intensive applications: Ideal for scenarios where robust logical inference is paramount.
  • Instruction-following tasks: Benefiting from its instruction-tuned base, it can accurately follow complex directives, especially those with a reasoning component.

Developers can quickly integrate this model using the Hugging Face transformers library for text generation tasks.