Name: zhaohq/PureRL-7B-v7-s2-l2-maskon API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-7B-v7-s2-l2-maskon is a 7.6 billion parameter language model developed by zhaohq. It is a fine-tuned variant, built upon an unspecified base model, and trained using the Transformer Reinforcement Learning (TRL) framework.

Key Differentiator: GRPO Training

A core aspect of this model's development is its training with GRPO (Generalized Reinforcement Learning with Policy Optimization). This method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," is designed to significantly improve a model's reasoning abilities, especially in complex mathematical domains. This suggests the model is optimized for tasks requiring logical deduction and problem-solving.

Training Environment

The model was trained using specific versions of popular frameworks:

TRL: 0.16.0.dev0
Transformers: 4.57.6
PyTorch: 2.10.0
Datasets: 4.8.5
Tokenizers: 0.22.2

Potential Use Cases

Given its GRPO-enhanced training, this model is likely well-suited for:

Mathematical problem-solving
Logical reasoning tasks
Applications requiring precise and structured outputs

Developers can quickly get started with the provided transformers pipeline example for text generation.

Overview

Model Overview

Key Differentiator: GRPO Training

Training Environment

Potential Use Cases

Full Model Card (README)