cjiao/goldengoose-gumbel_tau0.50-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 22, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_tau0.50-25grp is a 1.5 billion parameter causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct with a 32K context length. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring improved logical and mathematical problem-solving, building upon the base Qwen2.5 architecture.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel_tau0.50-25grp is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-1.5B-Instruct base model. It features a substantial context length of 32,768 tokens, making it suitable for processing longer inputs.

Key Differentiator: GRPO Training

This model's primary distinction lies in its training methodology. It was fine-tuned using the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This technique is specifically designed to improve a model's capabilities in mathematical reasoning and problem-solving.

Training Details

The fine-tuning process utilized the TRL (Transformer Reinforcement Learning) library. The training leveraged specific versions of key frameworks:

  • TRL: 0.19.1
  • Transformers: 4.57.6
  • Pytorch: 2.5.1
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for applications requiring:

  • Mathematical problem-solving: Tasks that involve numerical reasoning, equations, and logical deduction.
  • Instruction following: Leveraging its Qwen2.5-Instruct base for general instruction-tuned capabilities.
  • Long-context understanding: Benefiting from its 32K context window for complex, multi-step reasoning problems.