Name: zhaohq/PureRL-1.5B-v7-s2-l2-maskon-fixed API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v7-s2-l2-maskon-fixed is a 1.5 billion parameter language model developed by zhaohq. It has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework, indicating a reinforcement learning approach to optimize its performance.

Key Training Methodology

A distinguishing feature of this model is its training procedure, which incorporates GRPO (Generalized Reinforcement Learning with Policy Optimization). This method was introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". The application of GRPO suggests an emphasis on improving the model's ability to handle complex reasoning tasks, potentially extending to areas beyond pure mathematics.

Intended Use

This model is suitable for various text generation tasks, particularly where improved logical consistency or reasoning capabilities are beneficial. Its training with GRPO, a method from a mathematical reasoning paper, implies a focus on structured and coherent output, making it a candidate for applications requiring more than just fluent text.

Overview

Model Overview

Key Training Methodology

Intended Use

Full Model Card (README)