Name: zhaohq/PureRL-1.5B-v7-s2-corr-maskon-afew API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

This model, zhaohq/PureRL-1.5B-v7-s2-corr-maskon-afew, is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned iteration of the zhaohq/PureRL-1.5B-v7-stage1-A-fewshot base model, specifically trained using the TRL (Transformer Reinforcement Learning) framework.

Key Training Details

Fine-tuning Method: The model was trained utilizing GRPO (Gradient-based Reward Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models".
Frameworks Used: Training involved TRL (version 0.16.0.dev0), Transformers (version 4.48.3), PyTorch (version 2.5.1+cu124), Datasets (version 4.0.0), and Tokenizers (version 0.21.1).

Potential Use Cases

Given its fine-tuning with the GRPO method, this model is likely optimized for:

Mathematical Reasoning: Tasks that require advanced mathematical problem-solving and logical deduction.
Enhanced Reasoning: General reasoning tasks where the GRPO method's benefits can be applied beyond pure mathematics.

Developers can quickly integrate this model using the Hugging Face pipeline for text generation, as demonstrated in the quick start example.

Overview

Model Overview

Key Training Details

Potential Use Cases

Full Model Card (README)