Name: zhaohq/PureRL-1.5B-v6g-B-lam03-sigmoid-maskoff API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v6g-B-lam03-sigmoid-maskoff is a 1.5 billion parameter language model, building upon the base architecture of Qwen/Qwen2.5-Math-1.5B. It has been specifically fine-tuned using the TRL (Transformer Reinforcement Learning) framework.

Key Training Methodology

A core differentiator for this model is its training procedure, which utilizes GRPO (Generalized Reinforcement Learning with Policy Optimization). This method was introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The application of GRPO suggests an optimization for tasks that demand robust mathematical and reasoning abilities.

Intended Use Cases

Given its foundation in a math-focused base model and fine-tuning with GRPO, this model is particularly well-suited for:

Mathematical Reasoning: Solving complex mathematical problems and generating logical explanations.
Problem Solving: Handling queries that require structured reasoning and analytical thinking.
Research and Development: As a base for further experimentation in reinforcement learning for language models, especially in mathematical domains.

Overview

Model Overview

Key Training Methodology

Intended Use Cases

Full Model Card (README)