Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_Since_1p0_0p0_1p0_grpo_42_rule

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jan 13, 2026Architecture:Transformer Cold

Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_Since_1p0_0p0_1p0_grpo_42_rule is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its capabilities. With a substantial context length of 131072 tokens, it is designed for general text generation tasks, potentially excelling in areas benefiting from advanced mathematical reasoning or structured optimization techniques.

Loading preview...

Overview

This model, Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_Since_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen2.5-1.5B-Instruct base model. It has been specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, a technique highlighted in the research behind DeepSeekMath. This training approach aims to improve the model's performance, particularly in areas that benefit from advanced reasoning and optimization.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct.
  • Parameter Count: 1.5 billion parameters.
  • Context Length: Features a notable context window of 131072 tokens.
  • Training Method: Fine-tuned with GRPO, as detailed in the DeepSeekMath paper.
  • Frameworks: Developed using TRL (Transformer Reinforcement Learning) version 0.23.0, along with Transformers 4.57.1 and Pytorch 2.7.1+cu128.

Potential Use Cases

Given its instruction-tuned nature and GRPO-enhanced training, this model is suitable for:

  • General text generation and conversational AI.
  • Tasks requiring improved reasoning capabilities, potentially in mathematical or logical domains, due to its GRPO training.
  • Applications benefiting from a large context window for processing extensive inputs or generating detailed outputs.