cjiao/goldengoose-gumbel_gradsim_tau0.50-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 24, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_gradsim_tau0.50-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring improved mathematical problem-solving and logical deduction, leveraging its 32768 token context length.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel_gradsim_tau0.50-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It utilizes a 32768 token context length, making it suitable for processing longer inputs.

Key Differentiator: GRPO Training

This model's primary distinction lies in its training methodology. It was fine-tuned using the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This technique is specifically designed to enhance a model's capabilities in mathematical reasoning and problem-solving tasks.

Training Details

The fine-tuning process was conducted using the TRL (Transformer Reinforcement Learning) framework. The training run details are available for visualization on Weights & Biases. The framework versions used include TRL 0.19.1, Transformers 4.57.6, Pytorch 2.5.1, Datasets 4.8.4, and Tokenizers 0.22.2.

Use Cases

Given its GRPO-based training, this model is particularly well-suited for applications requiring:

  • Mathematical reasoning: Solving complex math problems and logical puzzles.
  • Instruction following: Generating responses based on detailed instructions, especially those with a mathematical or logical component.
  • General language generation: Leveraging the strong base capabilities of the Qwen2.5-1.5B-Instruct model.