Name: thangvip/qwen2.5-1.5b-gspo-sgd-linear API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: thangvip

Model Overview

thangvip/qwen2.5-1.5b-gspo-sgd-linear is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-1.5B-Instruct base model. It has been specifically fine-tuned using the TRL framework, incorporating the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities & Training

Enhanced Mathematical Reasoning: The core differentiator of this model is its training with GRPO, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks that involve mathematical problem-solving and logical deduction.
Base Model: Built upon the Qwen2.5-1.5B-Instruct architecture, it inherits the general instruction-following capabilities of its parent model.
Training Framework: The fine-tuning process utilized the TRL (Transformers Reinforcement Learning) library, indicating a reinforcement learning approach to align the model's outputs.

Use Cases

This model is particularly well-suited for applications requiring improved mathematical and logical reasoning, especially within the constraints of a 1.5 billion parameter model. Developers can leverage its specialized training for tasks where the base Qwen2.5-1.5B-Instruct might fall short in complex reasoning scenarios.

Overview

Model Overview

Key Capabilities & Training

Use Cases

Full Model Card (README)