cjiao/goldengoose-gumbel_tau2.00-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 22, 2026Architecture:Transformer Warm

cjiao/goldengoose-gumbel_tau2.00-25grp is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring robust reasoning, particularly in mathematical contexts, leveraging its Qwen2.5 base architecture.

Loading preview...

Model Overview

cjiao/goldengoose-gumbel_tau2.00-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages the Qwen2.5 architecture, known for its strong performance across various language tasks.

Key Differentiator: GRPO Training

This model's primary distinction lies in its training methodology. It was fine-tuned using GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training approach is specifically designed to enhance a model's capabilities in mathematical reasoning.

Training Details

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Training Framework: TRL (Transformer Reinforcement Learning) version 0.19.1
  • Methodology: GRPO, as detailed in the DeepSeekMath research.

Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for:

  • Tasks requiring mathematical reasoning.
  • Applications where logical deduction and problem-solving are critical.
  • Instruction-following scenarios benefiting from improved reasoning abilities.