Name: zhaohq/PureRL-1.5B-v6b3-bare-fmt03 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v6b3-bare-fmt03 is a 1.5 billion parameter language model, fine-tuned by zhaohq from the base model Qwen/Qwen2.5-Math-1.5B. It is designed to excel in mathematical reasoning tasks, inheriting and enhancing the capabilities of its mathematical-focused predecessor.

Key Training Details

This model was trained using the TRL (Transformer Reinforcement Learning) framework. A significant aspect of its training procedure is the application of GRPO, a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a specialized approach to reinforcement learning from human feedback or similar techniques, aimed at improving mathematical problem-solving abilities.

Capabilities and Use Cases

Given its foundation and specialized training, this model is particularly suited for:

Mathematical Reasoning: Solving complex mathematical problems and generating logical steps.
Instruction Following: Responding to user prompts in a structured and coherent manner, especially for analytical questions.
Research and Development: Serving as a base for further experimentation in reinforcement learning for mathematical domains.

With a context length of 32768 tokens, it can process and generate relatively long and detailed responses, which is beneficial for multi-step mathematical derivations or explanations.

Overview

Model Overview

Key Training Details

Capabilities and Use Cases

Full Model Card (README)