Name: cjiao/goldengoose-gumbel_gmrel_tau1.00-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-gumbel_gmrel_tau1.00-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages the TRL (Transformer Reinforcement Learning) framework for its training process.

Key Differentiator: GRPO Method

A significant aspect of this model's training is the application of the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method. This technique, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", aims to enhance the model's mathematical reasoning abilities. By integrating GRPO, this model is designed to perform more effectively on tasks that require logical and mathematical problem-solving.

Training Details

The model was trained with specific versions of popular frameworks:

TRL: 0.19.1
Transformers: 4.57.6
Pytorch: 2.5.1
Datasets: 4.8.4
Tokenizers: 0.22.2

Potential Use Cases

Given its fine-tuning with the GRPO method, this model is particularly well-suited for:

Mathematical problem-solving: Tasks involving arithmetic, algebra, or more complex mathematical reasoning.
Logical inference: Scenarios where structured logical thinking is required.
Instruction following: Benefiting from its Qwen2.5-Instruct base, it can handle various instruction-based prompts, potentially with improved reasoning for quantitative questions.

Overview

Model Overview

Key Differentiator: GRPO Method

Training Details

Potential Use Cases

Full Model Card (README)