Name: shawntzx/Qwen2.5-3B-GRPO-3_5_8_6k API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shawntzx

Model Overview

shawntzx/Qwen2.5-3B-GRPO-3_5_8_6k is a 3.1 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-3B-Instruct base model. This model distinguishes itself by incorporating the GRPO (Gradient-based Reward Policy Optimization) training method. GRPO, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is designed to improve reasoning capabilities, particularly in complex domains.

Key Capabilities

Enhanced Reasoning: Leverages the GRPO training methodology to potentially improve performance on tasks requiring structured thought and problem-solving.
Qwen2.5 Base: Benefits from the strong foundational capabilities of the Qwen2.5-3B-Instruct model.
Extended Context: Supports a substantial context length of 32768 tokens, allowing for processing longer inputs and maintaining coherence over extended dialogues or documents.

Training Details

The model was fine-tuned using the TRL library, with specific framework versions including TRL 0.15.0.dev0, Transformers 4.49.0.dev0, Pytorch 2.5.1, Datasets 3.2.0, and Tokenizers 0.21.0.

Good For

Applications requiring improved reasoning, especially in areas where GRPO's benefits are applicable.
Tasks that can leverage a 3.1 billion parameter model with a large context window for detailed understanding and generation.
Developers looking to experiment with models trained using advanced optimization techniques like GRPO.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)