Name: zhaohq/PureRL-1.5B-v6i-B-step01-final03 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v6i-B-step01-final03 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-1.5B base model. It utilizes a substantial context length of 32768 tokens, making it suitable for processing longer inputs.

Key Capabilities

Mathematical Reasoning: This model is specifically enhanced for mathematical reasoning tasks, building on the capabilities of its Qwen2.5-Math foundation.
GRPO Training: It was trained using the GRPO (Gradient Regularized Policy Optimization) method, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training approach aims to improve performance in complex reasoning scenarios.
TRL Framework: The fine-tuning process was conducted using the TRL (Transformer Reinforcement Learning) library, a framework for applying reinforcement learning to transformer models.

Use Cases

This model is particularly well-suited for applications requiring strong mathematical problem-solving and reasoning abilities. Its training methodology and base model suggest its utility in tasks that benefit from advanced numerical and logical processing.

Overview

Model Overview

Key Capabilities

Use Cases

Full Model Card (README)