Name: zhaohq/PureRL-1.5B-v14L-stage1-bce-binary-k8 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v14L-stage1-bce-binary-k8 is a 1.5 billion parameter language model. It has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework, a library for training transformer language models with reinforcement learning.

Key Training Details

A notable aspect of this model's development is its training procedure, which incorporated GRPO (Generalized Reinforcement Learning with Policy Optimization). This method was originally introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." While the base model is not specified, the application of GRPO suggests an emphasis on improving reasoning capabilities, potentially extending beyond just mathematical contexts.

Quick Start

Users can quickly get started with this model using the transformers library, as demonstrated by the provided Python pipeline example for text generation. The model supports a context length of 32768 tokens.

Framework Versions

The model was trained using specific versions of key frameworks:

TRL: 0.16.0.dev0
Transformers: 4.48.3
Pytorch: 2.5.1
Datasets: 4.0.0
Tokenizers: 0.21.1

Potential Use Cases

Given its training methodology, this model could be particularly suitable for:

General text generation tasks where enhanced reasoning might be beneficial.
Applications requiring a compact model (1.5B parameters) with a focus on structured or logical responses.
Exploration of models fine-tuned with advanced reinforcement learning techniques like GRPO.

Overview

Model Overview

Key Training Details

Quick Start

Framework Versions

Potential Use Cases

Full Model Card (README)