Name: zhaohq/PureRL-7B-v7-s2-corr-maskon API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-7B-v7-s2-corr-maskon is a 7.6 billion parameter language model that has been fine-tuned using the TRL framework. Its training incorporates GRPO (Gradient-based Reinforcement Learning with Policy Optimization), a method introduced in the DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models paper.

Key Capabilities

Fine-tuned Performance: Leverages the TRL framework for enhanced instruction following and response generation.
GRPO Training: Benefits from a training procedure designed to improve reasoning capabilities, as outlined in the DeepSeekMath research.
General Text Generation: Capable of generating coherent and contextually relevant text for a variety of prompts.

Training Details

The model's training procedure utilized GRPO, a method that has shown effectiveness in improving mathematical reasoning in large language models. The training environment included specific versions of key frameworks:

TRL: 0.16.0.dev0
Transformers: 4.57.6
Pytorch: 2.10.0
Datasets: 4.8.5
Tokenizers: 0.22.2

Good For

Developers looking for a 7.6B parameter model fine-tuned with advanced reinforcement learning techniques.
Applications requiring general text generation with potentially improved reasoning characteristics due to its GRPO training.
Experimentation with models that incorporate methods from mathematical reasoning research.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)