Name: KickItLikeShika/Qwen2.5-1.5B-Instruct-SFT-GRPO-GSM8K API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: KickItLikeShika

Model Overview

The KickItLikeShika/Qwen2.5-1.5B-Instruct-SFT-GRPO-GSM8K is a 1.5 billion parameter Qwen2.5-based instruction-tuned model, specifically engineered for grade-level mathematical reasoning. Its primary distinction lies in its ability to produce structured outputs for math problems, including a detailed scratchpad within <reasoning>…</reasoning> tags and a single numerical answer within <answer>…</answer> tags.

Key Capabilities

Structured Mathematical Reasoning: Generates explicit step-by-step reasoning processes for math problems.
Grade-Level Math Proficiency: Fine-tuned to solve mathematical problems typically found at the grade school level.
Instruction-Tuned: Optimized to follow instructions for generating structured math solutions.

Training Methodology

The model underwent a two-stage training process:

LoRA SFT: Initial fine-tuning using Low-Rank Adaptation (LoRA) on 100 random GSM8K training examples. This stage focused on teaching the model the desired output format and generating roughly sensible reasoning traces.
GRPO: Subsequent training using Guided Reinforcement Learning from Pre-training Objectives (GRPO) for 2,000 steps, building upon the initial LoRA adapter.

Good For

Applications requiring transparent and verifiable mathematical problem-solving.
Educational tools that need to show step-by-step solutions to math problems.
Scenarios where a small, specialized model for math reasoning is preferred over larger, general-purpose LLMs.

Overview

Model Overview

Key Capabilities

Training Methodology

Good For

Full Model Card (README)