cjiao/goldengoose-gumbel-0.10-100

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 7, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel-0.10-100 model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, it utilizes the GRPO training method, which is designed to enhance mathematical reasoning capabilities. This model is suitable for tasks requiring improved logical and mathematical problem-solving, building upon its Qwen2.5 base with a context length of 32768 tokens.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel-0.10-100 is a 1.5 billion parameter instruction-tuned language model, building upon the foundation of Qwen/Qwen2.5-1.5B-Instruct. It has been fine-tuned using the TRL framework and incorporates the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) training method.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model's training with GRPO, a method introduced in the DeepSeekMath paper, suggests an optimization for mathematical and logical problem-solving tasks.
  • Instruction Following: As an instruction-tuned model, it is designed to follow user prompts effectively.
  • Qwen2.5 Base: Benefits from the robust architecture and general language understanding of the Qwen2.5 series.

Training Details

The model was trained by cjiao using the TRL library (version 0.19.1) and PyTorch (version 2.5.1). The GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), was central to its fine-tuning process.

Good For

  • Applications requiring improved mathematical reasoning.
  • General instruction-following tasks where a compact yet capable model is needed.
  • Experimentation with models fine-tuned using advanced reinforcement learning techniques like GRPO.