cjiao/goldengoose-gumbel_tau1.00-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 22, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_tau1.00-25grp is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, this model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is particularly suited for tasks requiring improved logical and mathematical problem-solving, building upon the base Qwen2.5 architecture.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel_tau1.00-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It was developed by cjiao and trained using the TRL framework.

Key Capabilities

  • Enhanced Mathematical Reasoning: This model was specifically trained with GRPO (Gumbel-softmax Reinforcement Learning for Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This training approach aims to improve the model's ability to handle complex mathematical and logical reasoning tasks.
  • Instruction Following: As a fine-tuned instruction model, it is designed to follow user prompts effectively, building on the capabilities of its Qwen2.5-Instruct base.

Training Details

The model's training procedure leveraged the TRL library, with specific versions including TRL 0.19.1, Transformers 4.57.6, Pytorch 2.5.1, Datasets 4.8.4, and Tokenizers 0.22.2. The application of the GRPO method suggests an optimization for tasks where precise, step-by-step reasoning is crucial.

Good For

  • Applications requiring improved mathematical problem-solving.
  • Tasks benefiting from enhanced logical reasoning capabilities.
  • Instruction-following scenarios where the base Qwen2.5-Instruct model's performance needs a boost in reasoning accuracy.