cjiao/goldengoose-gumbel_combined_gmrel_tau0.50-25grp

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 30, 2026Architecture:Transformer Cold

The cjiao/goldengoose-gumbel_combined_gmrel_tau0.50-25grp is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced reasoning, particularly in mathematical contexts, building upon the robust foundation of the Qwen2.5 architecture.

Loading preview...

Overview

This model, cjiao/goldengoose-gumbel_combined_gmrel_tau0.50-25grp, is a 1.5 billion parameter language model derived from the Qwen2.5-1.5B-Instruct base model. It has been specifically fine-tuned using the TRL library and incorporates the GRPO (Gumbel-Softmax Reinforcement Learning with Policy Optimization) training method.

Key Capabilities

  • Enhanced Reasoning: The primary differentiator of this model is its training with GRPO, a method introduced in the DeepSeekMath paper. This suggests an optimization for complex reasoning tasks, particularly in mathematical domains.
  • Instruction Following: As it is fine-tuned from an instruction-tuned base model (Qwen2.5-1.5B-Instruct), it retains strong capabilities in following user instructions.
  • Efficient Size: With 1.5 billion parameters, it offers a balance between performance and computational efficiency, making it suitable for deployment in resource-constrained environments.

Training Details

The model's training leveraged the TRL (Transformer Reinforcement Learning) framework, version 0.19.1, and PyTorch 2.5.1. The GRPO method, as detailed in the DeepSeekMath research, was central to its fine-tuning process.

Good For

  • Applications requiring robust mathematical reasoning.
  • Tasks where a smaller, efficient model with enhanced reasoning capabilities is preferred.
  • Instruction-following tasks where the base Qwen2.5-1.5B-Instruct model's performance is desired, with potential improvements in reasoning.