Name: thangvip/qwen2.5-1.5b-grpo-no-sft-sgd-linear API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: thangvip

Model Overview

thangvip/qwen2.5-1.5b-grpo-no-sft-sgd-linear is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages a significant training innovation: the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method. This technique, originally introduced in the DeepSeekMath paper, aims to push the limits of mathematical reasoning in open language models.

Key Characteristics

Base Model: Qwen2.5-1.5B-Instruct, providing a strong foundation for general language understanding and generation.
Fine-tuning Method: Utilizes GRPO, a reinforcement learning approach, to enhance specific reasoning capabilities.
Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer inputs and maintaining coherence over extended interactions.
Training Framework: Trained using the TRL library, indicating a focus on instruction-following and alignment.

Potential Use Cases

This model is particularly well-suited for applications where improved reasoning, especially in areas like mathematics or complex problem-solving, is beneficial. Its GRPO fine-tuning suggests an advantage in tasks requiring more structured and logical thought processes compared to its base model.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)