Name: zhaohq/PureRL-1.5B-v7-s2-l2-kl-w1-b1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v7-s2-l2-kl-w1-b1 is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned iteration of the zhaohq/PureRL-1.5B-v7-stage1-reasoning base model, specifically enhanced for advanced reasoning tasks. The model leverages a substantial context length of 32768 tokens, allowing it to process and generate longer, more complex sequences.

Training Methodology

This model was trained using the TRL (Transformer Reinforcement Learning) framework. A key aspect of its training procedure is the implementation of GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This indicates a strong focus on improving the model's ability to handle and solve mathematical reasoning problems.

Key Features

Parameter Count: 1.5 billion parameters.
Context Length: 32768 tokens.
Fine-tuned for Reasoning: Built upon a reasoning-focused base model.
GRPO Integration: Utilizes the GRPO method for enhanced mathematical reasoning capabilities.
TRL Framework: Developed using the TRL library for efficient reinforcement learning from human feedback or other reward signals.

Potential Use Cases

Given its training methodology and focus, this model is particularly well-suited for applications requiring:

Mathematical Problem Solving: Tasks involving complex mathematical reasoning and calculations.
Logical Deduction: Scenarios where structured logical thinking is required.
Advanced Question Answering: Answering questions that demand more than simple factual recall, especially those with a mathematical or logical component.

Overview

Model Overview

Training Methodology

Key Features

Potential Use Cases

Full Model Card (README)