Name: zhaohq/PureRL-1.5B-v7-s2-async-l2-maskon API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v7-s2-async-l2-maskon is a 1.5 billion parameter language model, fine-tuned by zhaohq using the TRL (Transformer Reinforcement Learning) framework. It leverages a training procedure called GRPO, a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This approach is specifically designed to enhance the model's capabilities in mathematical reasoning.

Key Capabilities

Enhanced Mathematical Reasoning: Trained with the GRPO method, this model is particularly adept at handling complex mathematical problems and reasoning tasks.
Long Context Window: Features a substantial context length of 32768 tokens, allowing it to process and understand extensive inputs for intricate problems.
TRL Framework: Developed using the TRL library, indicating a focus on reinforcement learning from human feedback or similar optimization techniques.

Good For

Applications requiring strong mathematical problem-solving abilities.
Tasks that benefit from processing long and detailed textual inputs.
Research and development in advanced language model training techniques, particularly those involving reinforcement learning for reasoning.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)