Name: abhi14/test-grpo-delete-me API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: abhi14

Overview

This model, abhi14/test-grpo-delete-me, is a 1.5 billion parameter language model fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. Its development utilized the TRL framework for training.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology, which incorporates GRPO (Gradient-based Reward Policy Optimization). GRPO is a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This suggests an optimization focus on enhancing the model's capabilities in mathematical reasoning and problem-solving.

Training Frameworks

TRL: Version 1.2.0
Transformers: Version 5.6.2
Pytorch: Version 2.11.0
Datasets: Version 4.8.4
Tokenizers: Version 0.22.2

Potential Use Cases

Given its fine-tuning from an instruction-following model and the application of GRPO, this model is likely well-suited for:

Tasks requiring mathematical reasoning.
Instruction-following in contexts that benefit from logical deduction.
Applications where a smaller, specialized model for numerical or logical problems is preferred.

Overview

Overview

Key Differentiator: GRPO Training

Training Frameworks

Potential Use Cases

Full Model Card (README)