cjiao/goldengoose-gumbel-1.00-100

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 7, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel-1.00-100 model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, it utilizes the GRPO training method, which is designed to enhance mathematical reasoning capabilities. With a context length of 32768 tokens, this model is optimized for tasks requiring robust logical and mathematical problem-solving.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel-1.00-100 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages a substantial context length of 32768 tokens, making it suitable for processing longer inputs and maintaining conversational coherence over extended interactions.

Key Training Details

This model was trained using the GRPO (Gumbel-Softmax Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The application of GRPO suggests a focus on improving the model's ability to handle complex reasoning tasks, particularly in mathematical domains. The fine-tuning process was conducted using the TRL (Transformer Reinforcement Learning) framework.

Potential Use Cases

Given its foundation in Qwen2.5-1.5B-Instruct and specialized training with GRPO, this model is likely well-suited for:

  • Mathematical Reasoning: Tasks involving problem-solving, logical deduction, and quantitative analysis.
  • Instruction Following: Generating responses based on specific user instructions, benefiting from its instruction-tuned base.
  • Long Context Understanding: Applications requiring the model to process and synthesize information from extensive textual inputs due to its 32768-token context window.