Name: cjiao/goldengoose-gumbel_tau1.00-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-gumbel_tau1.00-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It was developed by cjiao and trained using the TRL framework.

Key Capabilities

Enhanced Mathematical Reasoning: This model was specifically trained with GRPO (Gumbel-softmax Reinforcement Learning for Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This training approach aims to improve the model's ability to handle complex mathematical and logical reasoning tasks.
Instruction Following: As a fine-tuned instruction model, it is designed to follow user prompts effectively, building on the capabilities of its Qwen2.5-Instruct base.

Training Details

The model's training procedure leveraged the TRL library, with specific versions including TRL 0.19.1, Transformers 4.57.6, Pytorch 2.5.1, Datasets 4.8.4, and Tokenizers 0.22.2. The application of the GRPO method suggests an optimization for tasks where precise, step-by-step reasoning is crucial.

Good For

Applications requiring improved mathematical problem-solving.
Tasks benefiting from enhanced logical reasoning capabilities.
Instruction-following scenarios where the base Qwen2.5-Instruct model's performance needs a boost in reasoning accuracy.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)