Name: Edmon02/mathphd-plus-plus-0.5b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Edmon02

Overview

Edmon02/mathphd-plus-plus-0.5b is a 0.5 billion parameter language model, fine-tuned from Qwen2.5-0.5B-Instruct, developed by Edmon (Edmon02) as a community research project. It is specifically engineered for mathematical reasoning, focusing on natural language problem-solving. The model utilizes a supervised fine-tuning (SFT) approach on curated math instruction data, incorporating structured <thinking> and <answer> tags, optional process reward modeling (PRM), and reinforcement learning from verifiable rewards (GRPO) with SymPy-backed correctness checks.

Key Capabilities

Mathematical Reasoning: Specialized in solving step-by-step math word problems and competition-style reasoning, including informal proofs and chain-of-thought processes.
Structured Output: Formats assistant outputs with reasoning blocks and final answers to encourage verifiable extraction.
Efficient Deployment: Designed as a reproducible checkpoint for research, suitable for experimentation on single consumer or Colab GPUs due to its small size.
ChatML Format: Uses ChatML (<|im_start|> / <|im_end|>) for chat interactions.

Training and Performance

The model was fine-tuned using a mix of public datasets including MetaMath-style QA, Competition MATH, GSM8K, OpenMathInstruct-2, and NuminaMath-CoT. Preliminary evaluations on a 200-sample cap showed an accuracy of 18.5% on GSM8K and 6.0% on MATH, indicating its capacity limitations at this scale but demonstrating the effectiveness of SFT for GSM8K.

Limitations

Capacity-Limited: Due to its small size, it may underperform larger models on complex competition math and lengthy proofs.
Informal Reasoning: Outputs are not formally verified and require external proof checkers or code execution for validation.
Language Specificity: Primarily focused on English mathematical text; performance on mixed-language or non-math prompts is not guaranteed.

Good for

Researchers and developers experimenting with math-focused LLMs on resource-constrained hardware.
Applications requiring step-by-step mathematical problem-solving and reasoning in natural language.
Exploring the impact of structured fine-tuning and reinforcement learning techniques on mathematical tasks.

Overview

Overview

Key Capabilities

Training and Performance

Limitations

Good for

Full Model Card (README)