Name: zhaohq/PureRL-1.5B-v9E-digit-w050 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v9E-digit-w050 is a 1.5 billion parameter language model, building upon the Qwen/Qwen2.5-Math-1.5B architecture. It has been specifically fine-tuned using the TRL framework, incorporating the GRPO (Gradient-based Reward Policy Optimization) method. This training approach is derived from research presented in "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," indicating a strong focus on improving mathematical reasoning abilities.

Key Capabilities

Enhanced Mathematical Reasoning: Fine-tuned with GRPO, a method designed to improve performance on mathematical tasks.
Qwen2.5 Base: Leverages the robust foundation of the Qwen2.5-Math-1.5B model.
TRL Framework: Utilizes the Transformer Reinforcement Learning (TRL) library for its training procedure.
Large Context Window: Supports a context length of 32768 tokens, allowing for processing longer inputs and more complex problems.

Training Details

The model's training involved GRPO, as detailed in the DeepSeekMath paper, suggesting an emphasis on optimizing for accurate mathematical problem-solving. The training process was tracked via Weights & Biases, providing transparency into its development.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)