cjiao/goldengoose-gumbel_gradsim_tau0.10-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 24, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_gradsim_tau0.10-25grp is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities in open language models. This model is optimized for tasks requiring robust mathematical problem-solving and logical deduction, building upon the Qwen2.5 architecture. Its primary strength lies in its specialized training for complex reasoning, making it suitable for applications demanding high accuracy in structured problem-solving.

Loading preview...

Model Overview

This model, cjiao/goldengoose-gumbel_gradsim_tau0.10-25grp, is a specialized 1.5 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model, developed by cjiao. A key differentiator is its training methodology: it leverages the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This training approach aims to significantly improve the model's proficiency in mathematical reasoning and complex problem-solving.

Key Capabilities

  • Enhanced Mathematical Reasoning: Specifically trained with the GRPO method to excel in tasks requiring logical deduction and mathematical problem-solving.
  • Instruction Following: Inherits strong instruction-following capabilities from its Qwen2.5-1.5B-Instruct base.
  • Efficient Size: At 1.5 billion parameters, it offers a balance between performance and computational efficiency for specialized tasks.

Good For

  • Applications requiring robust mathematical reasoning.
  • Tasks involving logical problem-solving and structured output.
  • Developers looking for a compact model with specialized reasoning capabilities.