Edmon02/mathphd-plus-plus-0.5b
Edmon02/mathphd-plus-plus-0.5b is a 0.5 billion parameter language model, fine-tuned from Qwen2.5-0.5B-Instruct, specifically designed for mathematical reasoning in natural language. It excels at step-by-step math word problems and competition-style reasoning, utilizing structured thinking and answer tags. This model is a reproducible checkpoint for research on math LLMs, optimized for deployment on consumer-grade GPUs with a 32K context length.
Loading preview...
Overview
Edmon02/mathphd-plus-plus-0.5b is a 0.5 billion parameter language model, fine-tuned from Qwen2.5-0.5B-Instruct, developed by Edmon (Edmon02) as a community research project. It is specifically engineered for mathematical reasoning, focusing on natural language problem-solving. The model utilizes a supervised fine-tuning (SFT) approach on curated math instruction data, incorporating structured <thinking> and <answer> tags, optional process reward modeling (PRM), and reinforcement learning from verifiable rewards (GRPO) with SymPy-backed correctness checks.
Key Capabilities
- Mathematical Reasoning: Specialized in solving step-by-step math word problems and competition-style reasoning, including informal proofs and chain-of-thought processes.
- Structured Output: Formats assistant outputs with reasoning blocks and final answers to encourage verifiable extraction.
- Efficient Deployment: Designed as a reproducible checkpoint for research, suitable for experimentation on single consumer or Colab GPUs due to its small size.
- ChatML Format: Uses ChatML (
<|im_start|>/<|im_end|>) for chat interactions.
Training and Performance
The model was fine-tuned using a mix of public datasets including MetaMath-style QA, Competition MATH, GSM8K, OpenMathInstruct-2, and NuminaMath-CoT. Preliminary evaluations on a 200-sample cap showed an accuracy of 18.5% on GSM8K and 6.0% on MATH, indicating its capacity limitations at this scale but demonstrating the effectiveness of SFT for GSM8K.
Limitations
- Capacity-Limited: Due to its small size, it may underperform larger models on complex competition math and lengthy proofs.
- Informal Reasoning: Outputs are not formally verified and require external proof checkers or code execution for validation.
- Language Specificity: Primarily focused on English mathematical text; performance on mixed-language or non-math prompts is not guaranteed.
Good for
- Researchers and developers experimenting with math-focused LLMs on resource-constrained hardware.
- Applications requiring step-by-step mathematical problem-solving and reasoning in natural language.
- Exploring the impact of structured fine-tuning and reinforcement learning techniques on mathematical tasks.