Name: thangvip/qwen2.5-1.5b-dspo-no-sft-sgd-linear API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: thangvip

Overview

This model, thangvip/qwen2.5-1.5b-dspo-no-sft-sgd-linear, is a 1.5 billion parameter language model built upon the Qwen2.5-1.5B-Instruct architecture. It has been specifically fine-tuned using the GRPO (Gradient Regularized Policy Optimization) method, a technique highlighted in the research behind DeepSeekMath. This training approach aims to significantly improve the model's reasoning abilities, particularly in complex mathematical domains.

Key Capabilities

Enhanced Reasoning: Leverages the GRPO method for improved logical and mathematical reasoning.
Large Context Window: Supports a substantial context length of 131072 tokens, allowing for processing extensive inputs.
TRL Framework: Developed using the TRL (Transformers Reinforcement Learning) library, indicating a reinforcement learning-based fine-tuning process.

Good for

Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning and problem-solving.
Complex Reasoning Tasks: Suitable for scenarios where a model needs to follow intricate logical steps.
Research and Development: Useful for exploring the impact of GRPO on smaller, instruction-tuned models.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)