KickItLikeShika/Qwen2.5-1.5B-Instruct-SFT-GRPO-GSM8K

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 21, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The KickItLikeShika/Qwen2.5-1.5B-Instruct-SFT-GRPO-GSM8K is a 1.5 billion parameter instruction-tuned Qwen2.5 model developed by KickItLikeShika. It is specifically optimized for grade-level mathematical reasoning tasks, generating structured outputs with a scratchpad and a final numerical answer. This model excels at solving math problems by explicitly detailing its reasoning process, making it suitable for applications requiring transparent mathematical problem-solving.

Loading preview...

Model Overview

The KickItLikeShika/Qwen2.5-1.5B-Instruct-SFT-GRPO-GSM8K is a 1.5 billion parameter Qwen2.5-based instruction-tuned model, specifically engineered for grade-level mathematical reasoning. Its primary distinction lies in its ability to produce structured outputs for math problems, including a detailed scratchpad within <reasoning>…</reasoning> tags and a single numerical answer within <answer>…</answer> tags.

Key Capabilities

  • Structured Mathematical Reasoning: Generates explicit step-by-step reasoning processes for math problems.
  • Grade-Level Math Proficiency: Fine-tuned to solve mathematical problems typically found at the grade school level.
  • Instruction-Tuned: Optimized to follow instructions for generating structured math solutions.

Training Methodology

The model underwent a two-stage training process:

  1. LoRA SFT: Initial fine-tuning using Low-Rank Adaptation (LoRA) on 100 random GSM8K training examples. This stage focused on teaching the model the desired output format and generating roughly sensible reasoning traces.
  2. GRPO: Subsequent training using Guided Reinforcement Learning from Pre-training Objectives (GRPO) for 2,000 steps, building upon the initial LoRA adapter.

Good For

  • Applications requiring transparent and verifiable mathematical problem-solving.
  • Educational tools that need to show step-by-step solutions to math problems.
  • Scenarios where a small, specialized model for math reasoning is preferred over larger, general-purpose LLMs.