cjiao/goldengoose-gumbel_gmrel_tau0.50-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 24, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_gmrel_tau0.50-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, it utilizes the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance mathematical reasoning capabilities. With a context length of 32768 tokens, this model is optimized for tasks requiring robust reasoning, particularly in mathematical domains.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel_gmrel_tau0.50-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method, a technique highlighted in the research behind DeepSeekMath, to improve its reasoning abilities.

Key Characteristics

  • Base Model: Qwen2.5-1.5B-Instruct, providing a strong foundation for instruction following.
  • Training Method: Utilizes GRPO, a reinforcement learning approach designed to push the limits of mathematical reasoning in language models.
  • Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer inputs and maintaining conversational coherence over extended interactions.
  • Frameworks: Trained using the TRL library, with specific versions of Transformers, Pytorch, Datasets, and Tokenizers.

Use Cases

This model is particularly well-suited for applications requiring:

  • Mathematical Reasoning: Its training with GRPO suggests enhanced performance on tasks involving mathematical problem-solving and logical deduction.
  • Instruction Following: As a fine-tuned instruction model, it can effectively respond to a wide range of user prompts and commands.
  • General Text Generation: Capable of generating coherent and contextually relevant text for various purposes, building on the capabilities of its Qwen2.5 base.