Name: zhaohq/PureRL-1.5B-v7-s2-l2-kl-w2-b2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v7-s2-l2-kl-w2-b2 is a 1.5 billion parameter language model that has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework. A key aspect of its training methodology is the application of GRPO (Generalized Reinforcement Learning with Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This indicates a specialized focus on improving the model's ability to handle complex reasoning tasks.

Key Capabilities

Enhanced Reasoning: Utilizes the GRPO method, suggesting an optimization for tasks requiring logical and mathematical reasoning.
TRL Framework: Built upon the TRL library, indicating potential for further reinforcement learning-based fine-tuning or adaptation.

Good For

Mathematical Problem Solving: Given its training with GRPO from the DeepSeekMath paper, it is likely well-suited for mathematical reasoning and problem-solving tasks.
Complex Logical Queries: May perform effectively on tasks that demand structured logical thought processes.
Research and Development: Provides a base for exploring reinforcement learning techniques in language models, particularly for reasoning-intensive applications.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)