Name: lhkhiem28/Qwen2.5-3B-grpo API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: lhkhiem28

Overview

lhkhiem28/Qwen2.5-3B-grpo is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B model. This fine-tuning process utilized the lhkhiem28/HA-GRPO-datasets dataset and was performed using the TRL framework.

Key Capabilities

Enhanced Reasoning: The model incorporates the GRPO (Generative Reinforcement Learning with Policy Optimization) method, a technique highlighted in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This suggests a focus on improving the model's ability to handle complex logical and mathematical problems.
Instruction Following: As a fine-tuned model, it is designed to follow instructions effectively, making it suitable for interactive applications.

Training Details

The model was trained using TRL (Transformer Reinforcement Learning) and leverages the GRPO method. The training procedure can be further explored via the provided Weights & Biases link. Framework versions used include TRL 0.18.0.dev0, Transformers 4.52.0.dev0, Pytorch 2.6.0, Datasets 4.8.4, and Tokenizers 0.21.4.

Good For

Applications requiring improved mathematical reasoning.
Tasks benefiting from advanced logical problem-solving capabilities.
Developers looking for a Qwen2.5-3B variant with specialized reasoning enhancements.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)