cjiao/goldengoose-gumbel_gmrel_tau1.00-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 24, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_gmrel_tau1.00-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is particularly suited for tasks requiring improved logical and mathematical problem-solving, building upon its Qwen2.5 base.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel_gmrel_tau1.00-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages the TRL (Transformer Reinforcement Learning) framework for its training process.

Key Differentiator: GRPO Method

A significant aspect of this model's training is the application of the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method. This technique, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", aims to enhance the model's mathematical reasoning abilities. By integrating GRPO, this model is designed to perform more effectively on tasks that require logical and mathematical problem-solving.

Training Details

The model was trained with specific versions of popular frameworks:

  • TRL: 0.19.1
  • Transformers: 4.57.6
  • Pytorch: 2.5.1
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Potential Use Cases

Given its fine-tuning with the GRPO method, this model is particularly well-suited for:

  • Mathematical problem-solving: Tasks involving arithmetic, algebra, or more complex mathematical reasoning.
  • Logical inference: Scenarios where structured logical thinking is required.
  • Instruction following: Benefiting from its Qwen2.5-Instruct base, it can handle various instruction-based prompts, potentially with improved reasoning for quantitative questions.