Name: SohamK18/data-cleaning-grpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: SohamK18

Overview

SohamK18/data-cleaning-grpo is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its reasoning abilities.

Key Capabilities

Enhanced Reasoning: Specifically trained with GRPO to improve performance on tasks requiring logical and mathematical reasoning.
Instruction Following: Inherits strong instruction-following capabilities from its Qwen2.5-Instruct base.
Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer inputs.

Training Details

The model was fine-tuned using the TRL library, with a focus on applying the GRPO method. This training approach aims to push the limits of mathematical reasoning in open language models, as detailed in the DeepSeekMath research.

Good For

Applications requiring improved mathematical problem-solving.
Tasks benefiting from enhanced logical reasoning.
Use cases where a smaller, efficient model with strong reasoning is preferred.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)