cjiao/goldengoose-gumbel_combined_random-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 31, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_combined_random-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning in language models. This model is optimized for tasks requiring improved reasoning capabilities, particularly in mathematical contexts, leveraging its 32K context length.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel_combined_random-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages a substantial 32,768 token context length, making it suitable for processing longer inputs and generating more extensive responses.

Key Capabilities

  • Enhanced Reasoning: This model was specifically trained using the GRPO (Gumbel-softmax Reinforced Policy Optimization) method, as introduced in the DeepSeekMath paper. This training approach aims to push the limits of mathematical reasoning in open language models.
  • Instruction Following: As a fine-tuned instruction model, it is designed to understand and execute user prompts effectively, generating relevant and coherent text based on given instructions.

Training Details

The model's fine-tuning process utilized the TRL (Transformer Reinforcement Learning) library. The application of the GRPO method suggests a focus on improving the model's ability to handle complex logical and mathematical problems, differentiating it from standard instruction-tuned models.

Use Cases

This model is particularly well-suited for applications where robust reasoning, especially in mathematical or logical domains, is crucial. Its instruction-following capabilities also make it versatile for general text generation tasks where clarity and adherence to prompts are important.