Name: thangvip/qwen2.5-1.5b-seq-dspo-sgd-linear API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: thangvip

Model Overview

This model, thangvip/qwen2.5-1.5b-seq-dspo-sgd-linear, is a 1.5 billion parameter language model fine-tuned from the base Qwen/Qwen2.5-1.5B-Instruct model. It leverages the TRL library for its training process.

Training Methodology

A key differentiator for this model is its training procedure, which incorporates GRPO (Generalized Reinforcement Learning from Policy Optimization). This method was originally introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". The application of GRPO aims to enhance the model's performance, likely in areas related to reasoning and instruction following, building upon the capabilities of its Qwen2.5-1.5B-Instruct base.

Key Features

Base Model: Qwen2.5-1.5B-Instruct
Parameter Count: 1.5 Billion
Context Length: 32768 tokens
Training Method: Fine-tuned using GRPO via the TRL framework.

Potential Use Cases

Given its fine-tuning with GRPO, this model is likely well-suited for applications requiring:

Instruction Following: Generating responses based on specific user instructions.
Reasoning Tasks: Tasks that benefit from improved logical coherence and understanding, potentially in areas similar to those targeted by DeepSeekMath's GRPO application.
General Text Generation: Producing high-quality, contextually relevant text outputs.

Overview

Model Overview

Training Methodology

Key Features

Potential Use Cases

Full Model Card (README)