Name: zhaohq/PureRL-1.5B-v6i-A-step01-final01 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

This model, PureRL-1.5B-v6i-A-step01-final01, is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned version of the Qwen/Qwen2.5-Math-1.5B base model, leveraging the TRL (Transformer Reinforcement Learning) framework for its training. The model's development specifically incorporated the GRPO method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Key Capabilities

Enhanced Mathematical Reasoning: The primary focus of this model's fine-tuning is to improve its ability to handle complex mathematical problems and reasoning tasks, building upon its math-focused base model.
Reinforcement Learning Optimization: Utilizes the GRPO method for training, which is designed to push the boundaries of mathematical reasoning performance in open language models.

When to Use This Model

Mathematical Problem Solving: Ideal for applications requiring accurate and robust mathematical reasoning, such as solving equations, proofs, or complex quantitative analysis.
Research in RL for LLMs: Useful for researchers exploring the application of reinforcement learning techniques, specifically GRPO, to enhance specialized capabilities in language models.

Overview

Overview

Key Capabilities

When to Use This Model

Full Model Card (README)