cjiao/goldengoose-gumbel_gmrel_tau2.00-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 24, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_gmrel_tau2.00-25grp model is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, this model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. With a context length of 32768 tokens, it is optimized for tasks requiring advanced logical and mathematical problem-solving.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel_gmrel_tau2.00-25grp is a 1.5 billion parameter instruction-tuned language model, building upon the base architecture of Qwen/Qwen2.5-1.5B-Instruct. It features a substantial context length of 32768 tokens, making it suitable for processing longer inputs and generating comprehensive responses.

Key Differentiator: GRPO Training

What sets this model apart is its training methodology. It was fine-tuned using GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization), a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This specialized training aims to significantly improve the model's capabilities in mathematical reasoning and complex problem-solving.

Potential Use Cases

  • Mathematical Problem Solving: Ideal for applications requiring logical deduction and mathematical computation.
  • Complex Reasoning Tasks: Suitable for scenarios where structured, step-by-step reasoning is crucial.
  • Instruction Following: Benefits from its instruction-tuned base, making it responsive to detailed prompts.

Training Details

The model was trained using the TRL framework (version 0.19.1) and leverages PyTorch (version 2.5.1). The training process is publicly logged and can be visualized via Weights & Biases.