Name: khazarai/Math-RL API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: khazarai

Model Overview

khazarai/Math-RL is a 0.5 billion parameter model, fine-tuned from Qwen2.5-0.5B-Instruct. Its primary objective is to improve mathematical problem-solving capabilities through enhanced step-by-step reasoning. The model was optimized using Group Relative Policy Optimization (GRPO) with LoRa on a dataset of approximately 700 math problems.

Key Capabilities

Mathematical Reasoning: Specialized in generating step-by-step reasoning for math problems.
Small-Scale RLHF Research: Suitable for experiments with GRPO, a form of RLHF-style fine-tuning, on smaller instruction-tuned models.
Lightweight Deployment: Designed to function as a math reasoning assistant in environments with limited computational resources.
Educational Support: Can assist students with understanding and solving mathematical problems.

Intended Use Cases

Educational Tools: Integrating into platforms for math homework help or tutoring.
Research & Development: Exploring the effectiveness of GRPO and similar fine-tuning methods on reasoning tasks.
Resource-Constrained Applications: Deploying math assistance where larger models are impractical.

Limitations

Due to fine-tuning on a relatively small dataset (700 problems), the model's generalization to diverse math problems is limited. It may produce incorrect or hallucinated answers and should not be relied upon for high-stakes calculations or critical applications. Its strongest performance is on problems similar to its training data, and while the base model is multilingual, the math-specific fine-tuning was primarily English-based.

Overview

Model Overview

Key Capabilities

Intended Use Cases

Limitations

Full Model Card (README)