Name: zhaohq/PureRL-1.5B-v7-s2-l2-kl-w0-b1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

The zhaohq/PureRL-1.5B-v7-s2-l2-kl-w0-b1 is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned iteration of the zhaohq/PureRL-1.5B-v7-stage1-reasoning model, building upon its initial reasoning capabilities.

Key Training Details

This model was trained using the TRL (Transformer Reinforcement Learning) framework, specifically version 0.16.0.dev0. A significant aspect of its training procedure is the implementation of GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This suggests a strong focus on improving mathematical and general reasoning performance.

Intended Use Cases

Given its fine-tuning with the GRPO method, this model is particularly suited for:

Mathematical reasoning tasks: Leveraging the techniques from DeepSeekMath, it aims to excel in complex mathematical problem-solving.
Advanced reasoning applications: Building on its stage1 reasoning base, it can be applied to tasks requiring logical deduction and problem-solving.

Developers can quickly get started using the provided transformers pipeline for text generation, as demonstrated in the quick start guide.

Overview

Overview

Key Training Details

Intended Use Cases

Full Model Card (README)