Name: zhaohq/PureRL-1.5B-v11C-lam010 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v11C-lam010 is a 1.5 billion parameter language model, building upon the Qwen/Qwen2.5-Math-1.5B base model. It has been specifically fine-tuned using the TRL framework, incorporating the GRPO (Generalized Reinforcement Learning with Policy Optimization) method.

Key Capabilities

Enhanced Mathematical Reasoning: The model's training with GRPO, a technique detailed in the DeepSeekMath paper, aims to push the limits of mathematical reasoning in open language models.
Reinforcement Learning Fine-tuning: Leverages the TRL library for its training procedure, indicating a focus on improving performance through reinforcement learning techniques.
Qwen2.5-Math Base: Benefits from the strong mathematical foundation of its base model, Qwen/Qwen2.5-Math-1.5B.

Training Details

The model's training procedure involved GRPO, a method described in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an emphasis on improving the model's ability to handle complex mathematical problems and logical sequences. The training utilized TRL version 0.16.0.dev0, Transformers 4.48.3, Pytorch 2.5.1, Datasets 4.0.0, and Tokenizers 0.21.1.

Good For

Applications requiring strong mathematical problem-solving.
Tasks involving logical reasoning and deduction.
Research into reinforcement learning applications for language models.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)