Name: zhaohq/PureRL-1.5B-v5-06-mc2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v5-06-mc2 is a 1.5 billion parameter language model, building upon the Qwen/Qwen2.5-Math-1.5B architecture. It features a substantial context window of 32,768 tokens, making it suitable for processing longer inputs and complex problem statements. The model's development utilized the TRL framework and incorporated the GRPO (Gradient-based Reward Policy Optimization) method, as detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This training approach specifically targets the enhancement of mathematical reasoning abilities.

Key Capabilities

Enhanced Mathematical Reasoning: Fine-tuned with GRPO, a method designed to improve performance on mathematical tasks.
Long Context Understanding: Supports a 32K token context length, allowing for the processing of extensive problem descriptions or multi-step reasoning chains.
Qwen2.5-Math Base: Leverages the foundational capabilities of the Qwen2.5-Math-1.5B model, which is inherently strong in mathematical domains.

Good For

Mathematical Problem Solving: Ideal for applications requiring accurate and robust mathematical reasoning.
Complex Logical Deduction: Suitable for tasks that benefit from processing detailed information and deriving logical conclusions.
Research and Development: Provides a base for further experimentation and fine-tuning in mathematical AI applications, particularly those exploring reinforcement learning techniques like GRPO.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)