Name: zhaohq/PureRL-1.5B-v7-s2-l1-maskon-fixed API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v7-s2-l1-maskon-fixed is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned model, built upon an unspecified base, and leverages the TRL (Transformer Reinforcement Learning) framework for its training.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology, which incorporates GRPO (Generalized Reinforcement Learning with Policy Optimization). This method was introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The integration of GRPO suggests a strong focus on improving the model's capabilities in mathematical reasoning tasks.

Technical Details

Parameters: 1.5 Billion
Context Length: 32768 tokens
Training Frameworks: TRL (version 0.16.0.dev0), Transformers (version 4.48.3), PyTorch (version 2.5.1).

Intended Use Cases

Given its specialized training with GRPO, this model is particularly well-suited for applications that require:

Mathematical problem-solving: Excelling in tasks that demand logical and mathematical reasoning.
Complex numerical analysis: Handling intricate calculations and quantitative queries.
Research and development: As a base for further fine-tuning on specific mathematical or scientific domains.

Overview

Model Overview

Key Differentiator: GRPO Training

Technical Details

Intended Use Cases

Full Model Card (README)