Name: zhaohq/PureRL-1.5B-v7-s2-corr-maskoff API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v7-s2-corr-maskoff is a 1.5 billion parameter language model, fine-tuned using the TRL (Transformer Reinforcement Learning) framework. It leverages a substantial context length of 32768 tokens, making it capable of processing extensive inputs.

Key Differentiator: GRPO Training

A core aspect of this model is its training methodology. It was specifically trained using GRPO (Gradient-based Reinforcement Learning with Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training approach is designed to significantly enhance the model's capabilities in mathematical reasoning.

Capabilities

Enhanced Mathematical Reasoning: Optimized for complex mathematical problem-solving due to its GRPO training.
Long Context Understanding: Benefits from a 32768 token context window, allowing for detailed analysis of longer prompts and documents.
TRL Framework: Built upon the TRL framework, indicating a reinforcement learning approach to fine-tuning.

Recommended Use Cases

This model is particularly well-suited for applications requiring:

Solving mathematical problems and equations.
Logical deduction and reasoning tasks.
Processing and generating text where mathematical understanding is crucial.

Training Environment

The model was developed using specific versions of key frameworks:

TRL: 0.16.0.dev0
Transformers: 4.48.3
Pytorch: 2.5.1
Datasets: 4.0.0
Tokenizers: 0.21.1

Overview

Model Overview

Key Differentiator: GRPO Training

Capabilities

Recommended Use Cases

Training Environment

Full Model Card (README)