Name: zhaohq/PureRL-1.5B-v13D-lam025 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

This model, zhaohq/PureRL-1.5B-v13D-lam025, is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-Math-1.5B base model. It has been fine-tuned using Reinforcement Learning (RL) via the TRL library, specifically implementing the GRPO method. The GRPO method was first introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", suggesting an emphasis on robust reasoning capabilities.

Key Capabilities

Reinforcement Learning Fine-tuning: Utilizes the GRPO method for training, which is known for enhancing mathematical reasoning in large language models.
Base Model: Built upon Qwen2.5-Math-1.5B, indicating a foundation in mathematical and logical understanding.
Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer inputs and generating more coherent, extended responses.

Good For

General Text Generation: Capable of generating human-like text for various prompts, as demonstrated by the quick start example.
Reasoning-based Tasks: Given its lineage and training method, it may perform well in tasks requiring logical inference or structured problem-solving.
Exploration of RL-tuned Models: Developers interested in models fine-tuned with advanced RL techniques like GRPO can use this as a reference or starting point.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)