Name: p1atdev/qwen2.5-0.5b-grpo-math-01 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: p1atdev

Model Overview

p1atdev/qwen2.5-0.5b-grpo-math-01 is a compact 0.5 billion parameter language model built upon the Qwen2.5 architecture. It has been fine-tuned using the GRPO (Generative Reinforcement Learning with Policy Optimization) method, specifically targeting the domain of simple arithmetic problem-solving. The model's training focused on generating structured responses that include a thinking process (<think></think>) before delivering the final answer (<answer></answer>).

Key Capabilities

Arithmetic Problem Solving: Specialized in calculating results for basic formulas of the type A + B * C or A * B + C.
Structured Reasoning: Generates a detailed thought process within <think> tags, followed by the numerical answer in <answer> tags, adhering to a specific prompt format.
Efficient Training: Trained on a single A100 80G GPU for approximately 1 hour over 140 steps, demonstrating efficient fine-tuning for a niche task.

Training Details

The model was trained using a synthetic dataset of 100,000 simple arithmetic problems. Reward functions were implemented to encourage both correct output formatting and accurate numerical answers. The training utilized trl's GRPOTrainer with bf16 precision.

Use Cases

This model is ideal for applications requiring accurate and explainable solutions to simple arithmetic problems, particularly where a structured reasoning output is beneficial. Its small size makes it suitable for deployment in resource-constrained environments or for integration into larger systems as a specialized math co-processor for basic calculations.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)