Name: zhaohq/PureRL-1.5B-v5-06-uppl API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v5-06-uppl is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-1.5B base model. It leverages a 32,768 token context length, making it suitable for tasks requiring extensive contextual understanding.

Key Training Details

This model was trained using the TRL (Transformer Reinforcement Learning) framework. A notable aspect of its training procedure is the application of GRPO, a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a focus on improving mathematical reasoning and problem-solving abilities through reinforcement learning techniques.

Potential Use Cases

Given its foundation in Qwen2.5-Math-1.5B and subsequent fine-tuning with GRPO, this model is likely well-suited for:

Mathematical reasoning tasks: Solving complex math problems, generating mathematical explanations, or assisting in scientific computations.
General question answering: Benefiting from its fine-tuning to provide more coherent and logically sound responses.
Applications requiring robust logical inference: Where the ability to follow multi-step reasoning is crucial.

Overview

Model Overview

Key Training Details

Potential Use Cases

Full Model Card (README)