Edmon02/mathphd-plus-plus-0.5b

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 27, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Edmon02/mathphd-plus-plus-0.5b is a 0.5 billion parameter language model, fine-tuned from Qwen2.5-0.5B-Instruct, specifically designed for mathematical reasoning in natural language. It excels at step-by-step math word problems and competition-style reasoning, utilizing structured thinking and answer tags. This model is a reproducible checkpoint for research on math LLMs, optimized for deployment on consumer-grade GPUs with a 32K context length.

Loading preview...

Overview

Edmon02/mathphd-plus-plus-0.5b is a 0.5 billion parameter language model, fine-tuned from Qwen2.5-0.5B-Instruct, developed by Edmon (Edmon02) as a community research project. It is specifically engineered for mathematical reasoning, focusing on natural language problem-solving. The model utilizes a supervised fine-tuning (SFT) approach on curated math instruction data, incorporating structured <thinking> and <answer> tags, optional process reward modeling (PRM), and reinforcement learning from verifiable rewards (GRPO) with SymPy-backed correctness checks.

Key Capabilities

  • Mathematical Reasoning: Specialized in solving step-by-step math word problems and competition-style reasoning, including informal proofs and chain-of-thought processes.
  • Structured Output: Formats assistant outputs with reasoning blocks and final answers to encourage verifiable extraction.
  • Efficient Deployment: Designed as a reproducible checkpoint for research, suitable for experimentation on single consumer or Colab GPUs due to its small size.
  • ChatML Format: Uses ChatML (<|im_start|> / <|im_end|>) for chat interactions.

Training and Performance

The model was fine-tuned using a mix of public datasets including MetaMath-style QA, Competition MATH, GSM8K, OpenMathInstruct-2, and NuminaMath-CoT. Preliminary evaluations on a 200-sample cap showed an accuracy of 18.5% on GSM8K and 6.0% on MATH, indicating its capacity limitations at this scale but demonstrating the effectiveness of SFT for GSM8K.

Limitations

  • Capacity-Limited: Due to its small size, it may underperform larger models on complex competition math and lengthy proofs.
  • Informal Reasoning: Outputs are not formally verified and require external proof checkers or code execution for validation.
  • Language Specificity: Primarily focused on English mathematical text; performance on mixed-language or non-math prompts is not guaranteed.

Good for

  • Researchers and developers experimenting with math-focused LLMs on resource-constrained hardware.
  • Applications requiring step-by-step mathematical problem-solving and reasoning in natural language.
  • Exploring the impact of structured fine-tuning and reinforcement learning techniques on mathematical tasks.