Name: thangvip/qwen2.5-1.5b-grpo-sgd-linear API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: thangvip

Model Overview

This model, thangvip/qwen2.5-1.5b-grpo-sgd-linear, is a specialized 1.5 billion parameter language model derived from the Qwen2.5-1.5B-Instruct base. It has undergone fine-tuning using the TRL (Transformers Reinforcement Learning) library.

Key Differentiator: GRPO Training

The most significant aspect of this model is its training methodology. It leverages GRPO (Gradient-based Reinforcement Learning with Policy Optimization), a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a focus on enhancing the model's reasoning abilities, particularly in complex domains like mathematics.

Potential Use Cases

Given its GRPO-based training, this model is likely well-suited for:

Mathematical problem-solving: Tasks requiring logical deduction and numerical reasoning.
Complex reasoning tasks: Scenarios where understanding intricate relationships and drawing conclusions is crucial.
Instruction following: As it's fine-tuned from an instruct model, it should maintain strong instruction adherence, now augmented with improved reasoning.

Technical Details

Base Model: Qwen/Qwen2.5-1.5B-Instruct
Training Framework: TRL (Transformers Reinforcement Learning)
Context Length: 32768 tokens

Developers can quickly integrate this model using the transformers library for text generation tasks, as demonstrated in the quick start example.

Overview

Model Overview

Key Differentiator: GRPO Training

Potential Use Cases

Technical Details

Full Model Card (README)