Name: zhaohq/PureRL-1.5B-v6f-analysis-200step API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v6f-analysis-200step is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-1.5B base model. It was developed by zhaohq and trained using the TRL (Transformer Reinforcement Learning) framework.

Key Training Methodology

A significant aspect of this model is its training procedure, which incorporates GRPO (Generalized Reinforcement Learning with Policy Optimization). This method, introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," is specifically designed to improve a model's ability to handle complex mathematical and analytical reasoning tasks.

Capabilities and Use Cases

Given its foundation in a math-focused base model and the application of GRPO, this model is particularly suited for:

Mathematical Reasoning: Excelling in tasks that require logical deduction and problem-solving in mathematical contexts.
Analytical Tasks: Processing and generating responses for questions that demand structured analysis.
Complex Problem Solving: Handling inquiries that go beyond simple fact retrieval, requiring deeper understanding and inference.

Technical Details

Base Model: Qwen/Qwen2.5-Math-1.5B
Parameter Count: 1.5 Billion
Context Length: 32768 tokens
Training Framework: TRL (version 0.16.0.dev0)
Training Method: GRPO, as detailed in the DeepSeekMath paper.

This model offers a compact yet powerful solution for applications requiring robust analytical and mathematical reasoning, leveraging advanced reinforcement learning techniques.

Overview

Model Overview

Key Training Methodology

Capabilities and Use Cases

Technical Details

Full Model Card (README)