Name: zhaohq/PureRL-1.5B-v12C-lam010 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v12C-lam010 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-1.5B base model. It leverages a substantial context length of 32,768 tokens, making it suitable for processing longer inputs related to complex problems.

Key Differentiator: GRPO Training

What sets this model apart is its training methodology. It was fine-tuned using Reinforcement Learning (RL) with the GRPO method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This approach is specifically designed to improve the model's ability to perform advanced mathematical reasoning and problem-solving tasks.

Use Cases

Mathematical Reasoning: Ideal for applications requiring robust mathematical problem-solving, logical deduction, and numerical analysis.
Research and Development: Useful for researchers exploring the impact of RL-based fine-tuning methods like GRPO on specialized reasoning tasks.

Technical Details

The model was trained using the TRL library (version 0.16.0.dev0) and built upon Transformers 4.48.3 and Pytorch 2.5.1. This specific training regimen aims to enhance its performance in areas where precise mathematical understanding is critical.

Overview

Model Overview

Key Differentiator: GRPO Training

Use Cases

Technical Details

Full Model Card (README)