Name: zhaohq/PureRL-1.5B-v6d1-baseline-acc10 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v6d1-baseline-acc10 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-1.5B base model. It leverages a substantial 32768 token context length, making it capable of processing extensive inputs for complex tasks.

Key Capabilities

Enhanced Mathematical Reasoning: This model was specifically trained using GRPO (Generalized Reinforcement Learning for Policy Optimization), a method introduced in the DeepSeekMath paper, to push the limits of mathematical reasoning.
Fine-tuned with TRL: The model's training utilized the TRL (Transformer Reinforcement Learning) framework, indicating a focus on optimizing its performance through reinforcement learning techniques.
Qwen2.5-Math Base: Built upon the Qwen2.5-Math-1.5B architecture, it inherits a strong foundation for numerical and logical tasks.

Training Details

The training procedure for this model involved GRPO, as detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is designed to improve the model's ability to handle complex mathematical problems. The training process was tracked and can be visualized via Weights & Biases.

Good For

Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning and computation.
Scientific and Quantitative Analysis: Suitable for tasks in fields that demand precise numerical understanding and logical deduction.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)