Name: zhaohq/PureRL-1.5B-v12B-lam005 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

This model, zhaohq/PureRL-1.5B-v12B-lam005, is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-Math-1.5B base. It has been fine-tuned using the TRL library and incorporates the GRPO (Generative Reinforcement Learning with Policy Optimization) training method.

Key Capabilities

Enhanced Mathematical Reasoning: The model's training with GRPO, a method detailed in the DeepSeekMath paper, suggests a focus on improving mathematical problem-solving and logical deduction.
Qwen2.5-Math Foundation: Built upon a model specifically designed for mathematical tasks, it inherits and refines capabilities in this domain.
32K Context Length: Supports processing longer inputs and generating more extensive responses.

Training Details

The model was trained using the TRL framework (version 0.16.0.dev0) and leverages the GRPO method, which is known for pushing the limits of mathematical reasoning in open language models. This approach aims to optimize the model's ability to generate accurate and coherent mathematical solutions.

Good For

Applications requiring strong mathematical reasoning.
Tasks involving complex problem-solving and logical inference.
Research into reinforcement learning techniques for language models, particularly GRPO.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)