Name: zhaohq/PureRL-1.5B-v7-s2-l2-kl-w3-b2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v7-s2-l2-kl-w3-b2 is a 1.5 billion parameter language model that has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework. A key aspect of its training procedure involves the application of GRPO (Generalized Reinforcement Learning with Policy Optimization), a method specifically highlighted in the research behind DeepSeekMath. This indicates a specialized focus on improving the model's ability to handle complex mathematical reasoning tasks.

Key Training Details

Fine-tuning Framework: TRL (version 0.16.0.dev0)
Optimization Method: GRPO, as described in the "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" paper.
Framework Versions: Utilizes Transformers 4.48.3, Pytorch 2.5.1, Datasets 4.0.0, and Tokenizers 0.21.1.

Potential Use Cases

Mathematical Reasoning: Due to its GRPO training, this model is likely well-suited for tasks involving mathematical problem-solving and logical deduction.
Research and Development: Useful for researchers exploring reinforcement learning techniques in language model fine-tuning, particularly those interested in GRPO's application.

Overview

Model Overview

Key Training Details

Potential Use Cases

Full Model Card (README)