Name: zhaohq/PureRL-1.5B-v5-06-umsp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

zhaohq/PureRL-1.5B-v5-06-umsp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-1.5B base model. It was developed by zhaohq and utilizes the TRL (Transformer Reinforcement Learning) framework for its training. A key differentiator for this model is its application of the GRPO (Generalized Reinforcement Learning with Policy Optimization) training method, which was originally introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This approach aims to significantly improve the model's mathematical reasoning abilities.

Key Capabilities

Enhanced Mathematical Reasoning: Optimized through the GRPO method, making it suitable for complex mathematical problems.
Reinforcement Learning Fine-tuning: Benefits from TRL framework for improved performance in specific tasks.
Qwen2.5-Math Base: Builds upon a strong foundation designed for mathematical understanding.

Good for

Applications requiring robust mathematical problem-solving.
Research and development in reinforcement learning for language models.
Tasks where logical deduction and numerical accuracy are critical.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)