Name: zhaohq/PureRL-7B-v7-stage1-conf-tag-instruct API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-7B-v7-stage1-conf-tag-instruct is a 7.6 billion parameter instruction-tuned model built upon the Qwen/Qwen2.5-7B-Instruct architecture. This model distinguishes itself through its specialized training methodology, utilizing the TRL (Transformer Reinforcement Learning) framework.

Key Capabilities & Training

Enhanced Reasoning: The model was trained with GRPO (Gradient-based Reward Policy Optimization), a method introduced in the DeepSeekMath paper. This technique is specifically designed to push the limits of mathematical reasoning in open language models.
Fine-tuned Performance: By fine-tuning the robust Qwen2.5-7B-Instruct base with GRPO, this model aims to improve performance on tasks that benefit from advanced reasoning and problem-solving.
Framework Versions: The training utilized TRL 0.16.0.dev0, Transformers 4.57.6, Pytorch 2.10.0, Datasets 4.8.5, and Tokenizers 0.22.2.

Use Cases

This model is particularly well-suited for applications requiring:

Mathematical Problem Solving: Its GRPO training suggests strong performance in tasks involving mathematical reasoning and complex calculations.
Instruction Following: As an instruction-tuned model, it is designed to accurately follow user prompts and generate relevant responses.
Research and Development: Developers and researchers can leverage this model for exploring advanced reasoning capabilities in LLMs, especially in areas related to mathematics and logic.

Overview

Model Overview

Key Capabilities & Training

Use Cases

Full Model Card (README)