Name: zhaohq/PureRL-1.5B-v7-s2-l2-maskoff API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v7-s2-l2-maskoff is a 1.5 billion parameter language model developed by zhaohq. It has been fine-tuned using Reinforcement Learning (RL) through the TRL library, specifically employing the GRPO method.

Key Characteristics

Reinforcement Learning Fine-tuning: The model's training incorporates GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an emphasis on improving reasoning and response quality.
Parameter Count: With 1.5 billion parameters, it offers a balance between computational efficiency and performance.
Context Length: The model supports a substantial context length of 32768 tokens, allowing it to process and generate longer, more detailed responses while maintaining context.

Training Details

The model was trained using TRL version 0.16.0.dev0, Transformers 4.48.3, Pytorch 2.5.1, Datasets 4.0.0, and Tokenizers 0.21.1. The training process can be visualized via Weights & Biases.

Use Cases

This model is suitable for text generation tasks where nuanced understanding and coherent, context-aware responses are required, potentially benefiting from its RL-enhanced reasoning capabilities.

Overview

Model Overview

Key Characteristics

Training Details

Use Cases

Full Model Card (README)