Name: zhaohq/PureRL-1.5B-v7-s2-l1-maskon API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v7-s2-l1-maskon is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned version of an unspecified base model, leveraging the Transformer Reinforcement Learning (TRL) framework for its training.

Key Capabilities and Training

The primary differentiator of this model is its training procedure, which utilizes GRPO (Generalized Reinforcement Learning with Policy Optimization). This method was introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The application of GRPO suggests an optimization for complex reasoning tasks, particularly in mathematical domains, aiming to improve the model's ability to process and generate logically sound responses.

Technical Details

Parameters: 1.5 Billion
Context Length: 32768 tokens
Training Frameworks: TRL (version 0.16.0.dev0), Transformers (version 4.48.3), Pytorch (version 2.5.1), Datasets (version 4.0.0), Tokenizers (version 0.21.1).

Use Cases

This model is particularly suited for applications requiring enhanced mathematical reasoning and complex problem-solving. Its fine-tuning with GRPO indicates a focus on improving the logical coherence and accuracy of generated text in analytical contexts. Developers can integrate this model for tasks where robust reasoning capabilities are crucial, potentially outperforming general-purpose models in specific analytical domains.

Overview

Model Overview

Key Capabilities and Training

Technical Details

Use Cases

Full Model Card (README)