Name: Lansechen/Qwen2.5-7B-Open-R1-GRPO-math-lighteval-1epochstop-withformat API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Lansechen

Overview

This model, Lansechen/Qwen2.5-7B-Open-R1-GRPO-math-lighteval-1epochstop-withformat, is a 7.6 billion parameter language model derived from the Qwen/Qwen2.5-7B base. It has been fine-tuned using the TRL framework, with a particular focus on improving mathematical reasoning capabilities.

Key Capabilities

Enhanced Mathematical Reasoning: Specifically trained with the GRPO (Gradient-based Reward Policy Optimization) method, as detailed in the DeepSeekMath paper, to excel in complex mathematical problem-solving.
Qwen2.5-7B Foundation: Benefits from the robust architecture and general language understanding of the Qwen2.5-7B base model.

Good For

Applications requiring strong mathematical reasoning and problem-solving.
Tasks involving numerical analysis, logical deduction, and scientific computation.
Developers looking for a specialized model to handle math-intensive queries and generate accurate mathematical responses.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)