Name: zhaohq/PureRL-1.5B-v12D-lam025 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

This model, PureRL-1.5B-v12D-lam025, is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned version of the Qwen/Qwen2.5-Math-1.5B base model, leveraging the TRL (Transformer Reinforcement Learning) framework for its training.

Key Capabilities

Mathematical Reasoning: The model's training incorporates the GRPO (Generalized Reinforcement Learning with Policy Optimization) method, which is specifically designed to push the limits of mathematical reasoning in open language models, as detailed in the DeepSeekMath paper.
Reinforcement Learning Fine-tuning: Trained using the TRL library, indicating an optimization approach that likely enhances its ability to follow instructions and generate coherent, task-specific responses.
Context Length: Supports a substantial context window of 32768 tokens, allowing it to process and generate longer, more complex sequences of text.

Good For

Applications requiring advanced mathematical problem-solving.
Tasks benefiting from models fine-tuned with reinforcement learning techniques.
Scenarios where a 1.5 billion parameter model with a large context window is suitable for balancing performance and computational resources.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)