cjiao/goldengoose-gumbel_combined_grpoc_tau0.10-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 28, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_combined_grpoc_tau0.10-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the TRL library and incorporates the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method, which is designed to enhance mathematical reasoning capabilities. This model is particularly suited for tasks requiring improved logical and mathematical problem-solving, building upon its Qwen2.5 base.

Loading preview...

Model Overview

cjiao/goldengoose-gumbel_combined_grpoc_tau0.10-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct architecture. This model leverages the TRL (Transformer Reinforcement Learning) library for its training process.

Key Capabilities

  • Enhanced Reasoning: The model was specifically trained using the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method. This technique, introduced in the context of improving mathematical reasoning in large language models, suggests an emphasis on logical and problem-solving tasks.
  • Instruction Following: As a fine-tuned version of an instruction-tuned base model (Qwen2.5-1.5B-Instruct), it is designed to follow user instructions effectively.

Training Details

The training procedure utilized GRPO, a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training was conducted using TRL version 0.19.1, Transformers 4.57.6, Pytorch 2.5.1, Datasets 4.8.4, and Tokenizers 0.22.2.

Good For

  • Applications requiring improved mathematical or logical reasoning.
  • Tasks where robust instruction following is crucial.
  • Developers looking for a compact 1.5B parameter model with specialized reasoning enhancements.