Name: zhaohq/PureRL-1.5B-v7-s2-l2-kl-w2-b0 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

zhaohq/PureRL-1.5B-v7-s2-l2-kl-w2-b0 is a 1.5 billion parameter language model, fine-tuned by zhaohq from its base model, PureRL-1.5B-v7-stage1-reasoning. This model leverages the GRPO (Generalized Reinforcement Learning from Policy Optimization) training method, a technique highlighted in the DeepSeekMath paper, which focuses on pushing the limits of mathematical reasoning in open language models. It supports a substantial context length of 32768 tokens.

Key Capabilities

Enhanced Mathematical Reasoning: Benefits from the GRPO training procedure, making it suitable for tasks requiring advanced logical and mathematical problem-solving.
Fine-tuned Performance: Built upon a reasoning-focused base model, further optimized for specific performance characteristics.
Extended Context Window: Offers a 32768-token context length, allowing for processing longer inputs and more complex problem descriptions.

Good for

Mathematical Problem Solving: Ideal for applications that involve complex mathematical reasoning, logical deduction, and quantitative analysis.
Research and Development: Useful for researchers exploring reinforcement learning from human feedback (RLHF) techniques, particularly GRPO, in smaller-scale models.
Question Answering: Can be applied to question-answering systems where the questions require deep reasoning rather than simple fact retrieval.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)