Name: zhaohq/PureRL-1.5B-v14B-k4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v14B-k4 is a 1.5 billion parameter language model developed by zhaohq. It has been fine-tuned using the Transformer Reinforcement Learning (TRL) framework, specifically incorporating the GRPO (Generalized Reinforcement Learning with Policy Optimization) training method.

Key Training Details

Training Method: Utilizes GRPO, a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests a focus on enhancing the model's reasoning abilities, particularly in mathematical contexts.
Frameworks: Trained with TRL 0.16.0.dev0, Transformers 4.48.3, Pytorch 2.5.1, Datasets 4.0.0, and Tokenizers 0.21.1.

Potential Use Cases

This model is suitable for general text generation tasks. Given its training with GRPO, it may exhibit enhanced performance in scenarios requiring logical inference or structured reasoning, making it a candidate for applications beyond simple conversational AI.

Overview

Model Overview

Key Training Details

Potential Use Cases

Full Model Card (README)