Name: zhaohq/PureRL-1.5B-v7-s2-l2-maskon API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

zhaohq/PureRL-1.5B-v7-s2-l2-maskon is a 1.5 billion parameter language model, fine-tuned using the TRL (Transformer Reinforcement Learning) framework. This model leverages the GRPO (Generative Reinforcement Learning with Policy Optimization) training method, which is detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". The training procedure utilized specific versions of TRL (0.16.0.dev0), Transformers (4.57.6), Pytorch (2.10.0), Datasets (4.8.5), and Tokenizers (0.22.2).

Key Capabilities

Enhanced Mathematical Reasoning: Trained with GRPO, a method focused on improving mathematical problem-solving.
Reinforcement Learning Fine-tuning: Utilizes the TRL library for advanced fine-tuning techniques.
Large Context Window: Supports a context length of 32768 tokens, allowing for processing longer inputs and complex problems.

Good For

Applications requiring strong mathematical reasoning.
Research into reinforcement learning fine-tuning methods for language models.
Tasks benefiting from a model with a substantial context window for detailed problem analysis.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)