Name: cjiao/goldengoose-gumbel_tau2.00-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

cjiao/goldengoose-gumbel_tau2.00-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages the Qwen2.5 architecture, known for its strong performance across various language tasks.

Key Differentiator: GRPO Training

This model's primary distinction lies in its training methodology. It was fine-tuned using GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training approach is specifically designed to enhance a model's capabilities in mathematical reasoning.

Training Details

Base Model: Qwen/Qwen2.5-1.5B-Instruct
Training Framework: TRL (Transformer Reinforcement Learning) version 0.19.1
Methodology: GRPO, as detailed in the DeepSeekMath research.

Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for:

Tasks requiring mathematical reasoning.
Applications where logical deduction and problem-solving are critical.
Instruction-following scenarios benefiting from improved reasoning abilities.

Overview

Model Overview

Key Differentiator: GRPO Training

Training Details

Use Cases

Full Model Card (README)