Name: zhaohq/PureRL-7B-v6e-B-lam03-sigmoid-maskon-acc05 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

This model, PureRL-7B-v6e-B-lam03-sigmoid-maskon-acc05, is a 7.6 billion parameter language model developed by zhaohq. It is a fine-tuned version of the Qwen/Qwen2.5-Math-7B base model, specifically optimized for mathematical reasoning tasks. The model was trained using the Transformer Reinforcement Learning (TRL) framework, incorporating the GRPO (Gradient-based Reward Policy Optimization) method.

Key Training Details

Base Model: Qwen/Qwen2.5-Math-7B
Training Method: GRPO, as introduced in the DeepSeekMath paper, which focuses on pushing the limits of mathematical reasoning in open language models.
Framework: TRL (Transformer Reinforcement Learning)
Context Length: Supports a context length of 32768 tokens.

Use Cases

This model is particularly well-suited for applications requiring advanced mathematical problem-solving and reasoning. Its fine-tuning with the GRPO method suggests improved performance on complex mathematical queries and tasks compared to general-purpose language models.

Overview

Overview

Key Training Details

Use Cases

Full Model Card (README)