Name: zhaohq/PureRL-1.5B-v7-s2-l2-kl-w1-b2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v7-s2-l2-kl-w1-b2 is a 1.5 billion parameter language model developed by zhaohq. It has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework, specifically leveraging the GRPO method. GRPO, introduced in the "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" paper, indicates a focus on enhancing the model's reasoning abilities.

Key Characteristics

Parameter Count: 1.5 billion parameters.
Context Length: Supports a substantial context window of 32768 tokens.
Training Method: Utilizes GRPO, a method for improving mathematical reasoning in language models.
Frameworks: Trained with TRL, Transformers, Pytorch, Datasets, and Tokenizers.

Use Cases

This model is suitable for text generation tasks where improved reasoning, potentially in mathematical or logical contexts, is beneficial. Its training methodology suggests an advantage in handling complex prompts requiring structured thought processes. Developers can integrate it using the Hugging Face transformers pipeline for quick deployment.

Overview

Model Overview

Key Characteristics

Use Cases

Full Model Card (README)