Name: zhaohq/PureRL-1.5B-v7-s2-l1-maskoff API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v7-s2-l1-maskoff is a 1.5 billion parameter language model developed by zhaohq. This model has been fine-tuned using the TRL framework and incorporates the GRPO (Generalized Reinforcement Learning with Policy Optimization) training method. GRPO, detailed in the "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" paper, is designed to significantly improve a model's mathematical and logical reasoning abilities.

Key Capabilities

Enhanced Reasoning: Leverages the GRPO training method for improved performance on complex reasoning tasks, particularly in mathematical domains.
Fine-tuned Architecture: Built upon an unspecified base model and refined using the TRL library, indicating a focus on reinforcement learning from human feedback or similar optimization.
Extended Context Window: Supports a substantial context length of 32768 tokens, allowing for the processing of longer and more intricate problem descriptions or dialogues.

Training Details

The model's training procedure is publicly viewable via Weights & Biases, providing transparency into its development. It was trained with specific versions of key frameworks:

TRL: 0.16.0.dev0
Transformers: 4.48.3
Pytorch: 2.5.1
Datasets: 4.0.0
Tokenizers: 0.21.1

Good For

Applications requiring strong mathematical problem-solving.
Tasks that benefit from advanced logical deduction.
Research into reinforcement learning-based fine-tuning for reasoning.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)