Name: zhaohq/PureRL-1.5B-v7-stage1-A-fewshot API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

This model, zhaohq/PureRL-1.5B-v7-stage1-A-fewshot, is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-Math-1.5B base. It has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework.

Key Training Details

The model's training procedure incorporated GRPO (Gradient-based Reinforcement Learning with Policy Optimization). This method was introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," indicating a specialized focus on improving mathematical reasoning abilities.

Intended Use

Given its foundation in a math-focused base model and training with GRPO, this model is particularly suited for tasks requiring enhanced mathematical reasoning and problem-solving. Developers can integrate it using the Hugging Face pipeline for text generation tasks.

Framework Versions

Key frameworks used during its development include:

TRL: 0.16.0.dev0
Transformers: 4.57.6
Pytorch: 2.10.0
Datasets: 4.8.5
Tokenizers: 0.22.2

Overview

Model Overview

Key Training Details

Intended Use

Framework Versions

Full Model Card (README)