Name: zhaohq/PureRL-1.5B-v12A-lam002 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

PureRL-1.5B-v12A-lam002 Overview

This model, developed by zhaohq, is a 1.5 billion parameter language model fine-tuned from the Qwen/Qwen2.5-Math-1.5B base. It leverages the GRPO (Generalized Reinforcement Learning with Policy Optimization) training method, a technique highlighted in the DeepSeekMath paper, to improve its performance. The model supports a substantial context length of 32768 tokens, making it suitable for processing longer inputs.

Key Capabilities

Enhanced Mathematical Reasoning: Benefits from GRPO training, a method designed to push the limits of mathematical reasoning in open language models.
Long Context Understanding: Capable of handling inputs up to 32768 tokens, useful for complex problems requiring extensive context.
Fine-tuned from Qwen2.5-Math-1.5B: Builds upon a strong mathematical foundation.

Good for

Mathematical Problem Solving: Ideal for tasks requiring advanced mathematical reasoning and computation.
Research and Development: Useful for exploring and applying reinforcement learning techniques in language model fine-tuning.
Complex Query Handling: Its long context window makes it suitable for detailed questions or scenarios.

Overview

PureRL-1.5B-v12A-lam002 Overview

Key Capabilities

Good for

Full Model Card (README)