Name: zhaohq/PureRL-1.5B-v6b1-bare-fmt01 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v6b1-bare-fmt01 is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-Math-1.5B base model. It has been specifically fine-tuned using the TRL framework to improve its performance in mathematical reasoning tasks.

Key Differentiator

The primary distinction of this model lies in its training methodology. It utilizes GRPO (Gradient-based Reward Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This approach aims to significantly enhance the model's ability to handle complex mathematical problems and logical deductions.

Training Details

Base Model: Qwen/Qwen2.5-Math-1.5B
Fine-tuning Framework: TRL (Transformer Reinforcement Learning)
Optimization Method: GRPO, as detailed in the DeepSeekMath paper.

Use Cases

This model is particularly well-suited for applications requiring strong mathematical reasoning and problem-solving. Developers can leverage it for tasks such as:

Solving mathematical equations and word problems.
Generating logical explanations for mathematical concepts.
Assisting in educational tools focused on mathematics.
Any application where robust mathematical understanding is critical.

Overview

Model Overview

Key Differentiator

Training Details

Use Cases

Full Model Card (README)