Name: zhaohq/PureRL-1.5B-v6b4-detailed-fmt03 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

zhaohq/PureRL-1.5B-v6b4-detailed-fmt03 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-1.5B base model. It leverages the GRPO (Generalized Reinforcement Learning for Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities. This model is built using the TRL (Transformer Reinforcement Learning) framework.

Key Capabilities

Enhanced Mathematical Reasoning: Specialized training with GRPO significantly improves its ability to handle complex mathematical problems.
Reinforcement Learning Fine-tuning: Utilizes advanced reinforcement learning techniques for performance optimization.
Qwen2.5-Math Base: Benefits from the strong mathematical foundation of its Qwen2.5-Math-1.5B progenitor.

Good for

Applications requiring robust mathematical problem-solving.
Research and development in reinforcement learning for language models.
Tasks that demand logical deduction and numerical accuracy.
Developers looking for a compact model with strong mathematical aptitude.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)