Name: zhaohq/PureRL-1.5B-v7-s2-corr-maskon API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v7-s2-corr-maskon is a 1.5 billion parameter language model fine-tuned using the TRL (Transformer Reinforcement Learning) framework. This model incorporates the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) training method, which was originally introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Key Capabilities

Reinforcement Learning Fine-tuning: Utilizes the TRL library for advanced fine-tuning, suggesting improved performance on specific tasks through reinforcement learning.
GRPO Training Method: Employs a sophisticated training approach known for enhancing mathematical reasoning and problem-solving abilities, as evidenced by its origin in the DeepSeekMath research.
Large Context Window: Supports a context length of 32768 tokens, enabling the processing and generation of longer and more complex texts.

Good For

Mathematical Reasoning Tasks: Given its foundation in the DeepSeekMath paper's GRPO method, this model is likely well-suited for tasks requiring robust mathematical understanding and problem-solving.
Research in RLHF: Provides a practical example of a model trained with advanced reinforcement learning techniques, useful for researchers exploring RLHF methodologies.
Applications requiring long context: Its substantial context window makes it suitable for applications that involve processing or generating extensive documents, code, or complex dialogues.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)