Name: AbbottYang/Qwen2-0.5B-GRPO-test API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AbbottYang

Model Overview

AbbottYang/Qwen2-0.5B-GRPO-test is a 0.5 billion parameter language model, fine-tuned from the base Qwen/Qwen2-0.5B-Instruct model. It leverages a 32768 token context window.

Key Differentiator: GRPO Training

This model's primary distinction is its training methodology. It was fine-tuned using GRPO (Gradient-based Reward Policy Optimization), a method detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training approach is designed to enhance the model's reasoning capabilities, particularly in mathematical contexts.

Training Framework

The fine-tuning process was conducted using the TRL library, with specific versions:

TRL: 0.29.0
Transformers: 5.3.0
Pytorch: 2.7.0+cu128

Potential Use Cases

Given its GRPO-based training, this model is potentially suitable for:

Tasks requiring improved logical reasoning.
Applications where mathematical problem-solving is a component.
As a base for further fine-tuning on specific reasoning-intensive datasets.

Overview

Model Overview

Key Differentiator: GRPO Training

Training Framework

Potential Use Cases

Full Model Card (README)