cjiao/goldengoose-gumbel_combined_gmrel_tau0.10-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 30, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_combined_gmrel_tau0.10-25grp model is a 1.5 billion parameter instruction-tuned language model developed by cjiao, based on the Qwen2.5-1.5B-Instruct architecture. It was fine-tuned using the TRL library and incorporates the GRPO training method, as introduced in the DeepSeekMath paper, which is typically associated with enhancing mathematical reasoning. With a context length of 32768 tokens, this model is designed for general text generation tasks, potentially offering improved reasoning capabilities due to its specialized training approach.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel_combined_gmrel_tau0.10-25grp is a 1.5 billion parameter instruction-tuned language model, building upon the Qwen/Qwen2.5-1.5B-Instruct base. Developed by cjiao, this model leverages the TRL (Transformer Reinforcement Learning) library for its fine-tuning process.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It was trained using GRPO (Gumbel-Softmax Policy Optimization), a method highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). While the base model is general-purpose, the application of GRPO suggests an optimization for tasks that may benefit from enhanced reasoning or structured output generation, similar to its use in mathematical reasoning contexts.

Capabilities & Use Cases

This model is suitable for various text generation tasks, including answering questions, creative writing, and conversational AI, given its instruction-tuned nature. Its 32768-token context length allows for processing and generating longer sequences of text. The GRPO training could potentially make it more robust in tasks requiring logical coherence or adherence to specific patterns, distinguishing it from standard instruction-tuned models of similar size.