cjiao/goldengoose-gumbel_combined_gmrel_tau2.00-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 30, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_combined_gmrel_tau2.00-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, it leverages the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method, as introduced in DeepSeekMath, to enhance mathematical reasoning capabilities. This model is specifically optimized for tasks requiring robust mathematical and logical problem-solving, building upon the strong foundation of the Qwen2.5 architecture with a 32K context length.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel_combined_gmrel_tau2.00-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It was developed by cjiao and utilizes the TRL (Transformer Reinforcement Learning) framework for its training process.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It employs GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization), a technique detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is designed to enhance the model's mathematical reasoning abilities, suggesting a specialization in tasks that require complex logical and numerical problem-solving.

Technical Specifications

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Parameters: 1.5 Billion
  • Context Length: 32,768 tokens
  • Training Framework: TRL (version 0.19.1)

Potential Use Cases

Given its GRPO-enhanced training, this model is likely well-suited for applications requiring:

  • Mathematical problem-solving: From basic arithmetic to more complex algebraic or calculus-based questions.
  • Logical reasoning tasks: Where structured thought and step-by-step deduction are crucial.
  • Instruction following in technical domains: Especially those with a quantitative component.

Developers can quickly get started using the Hugging Face transformers pipeline for text generation, as demonstrated in the quick start guide.