saadxsalman/Q-SS-0.5B-Reasoning-Math

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:May 13, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Q-SS-0.5B-Reasoning-Math by Saad Salman is a 0.5 billion parameter language model, fine-tuned from Qwen2.5-0.5B-Instruct with a 32768 token context length. It utilizes Group Relative Policy Optimization (GRPO) reinforcement learning to excel at mathematical reasoning, providing explicit step-by-step thoughts and structured final answers. This model is specifically designed for transparent and parseable solutions to mathematical problems.

Loading preview...

Q-SS-0.5B-Reasoning-Math: A Structured Mathematical Reasoning Model

This model, developed by Saad Salman, is a 0.5 billion parameter variant of Qwen/Qwen2.5-0.5B-Instruct, specifically optimized for mathematical reasoning. It was trained using Group Relative Policy Optimization (GRPO), a reinforcement learning technique, on datasets like GSM8K and OpenR1-Math-220k.

Key Capabilities & Features

  • Explicit Reasoning: Generates step-by-step thought processes within <thought> tags before providing an answer.
  • Structured Output: Delivers final numerical answers in a clean, parseable format within <answer> tags.
  • RL-Trained: Benefits from reinforcement learning, learning from reward signals rather than just imitation.
  • Fine-tunable: Provided with full FP16 weights, allowing for further training or fine-tuning.
  • Apache 2.0 License: Free for both personal and commercial use.

Ideal Use Cases

This model is particularly well-suited for:

  • Basic Arithmetic: Reliable for fundamental calculations.
  • Multi-step Word Problems: Capable of handling complex word problems requiring sequential reasoning.
  • Problems with Units and Currency: Accurately processes problems involving various units and monetary values.

While effective for these tasks, it has limitations with complex abstract reasoning, geometry, calculus, and advanced competition math. A lightweight CPU-optimized GGUF version is also available for local inference.