Name: zhaohq/PureRL-7B-v6e-A-lam01-sigmoid-maskon-acc05 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-7B-v6e-A-lam01-sigmoid-maskon-acc05 is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-7B base model. This model leverages the GRPO (Gradient-based Reward Policy Optimization) training method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The fine-tuning process was conducted using the TRL framework.

Key Capabilities

Enhanced Mathematical Reasoning: Specialized training with GRPO aims to improve performance on mathematical problem-solving and logical reasoning tasks.
Qwen2.5-Math Foundation: Benefits from the strong mathematical pre-training of its base model, Qwen2.5-Math-7B.
Instruction Following: Designed to generate coherent and relevant responses to user prompts, as demonstrated by the quick start example.

Training Details

The model's training procedure utilized the TRL (Transformer Reinforcement Learning) library. The GRPO method, central to its training, is a technique for optimizing language models for specific reasoning tasks. Further details on the training run can be found on Weights & Biases (wandb.ai/zhaomichaelk-university-of-georgia/emnlp_7b/runs/vmjtfdbc).

Recommended Use Cases

This model is particularly well-suited for applications requiring:

Solving complex mathematical problems.
Advanced logical reasoning and analytical tasks.
Generating detailed and accurate explanations for quantitative questions.

Overview

Model Overview

Key Capabilities

Training Details

Recommended Use Cases

Full Model Card (README)