cjiao/goldengoose-high_div_rand_weighted-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 21, 2026Architecture:Transformer Warm

The cjiao/goldengoose-high_div_rand_weighted-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, this model utilizes the GRPO training method, which is known for enhancing mathematical reasoning in language models. It is designed for general text generation tasks, particularly those benefiting from improved reasoning capabilities, and supports a context length of 32768 tokens.

Loading preview...

Model Overview

The cjiao/goldengoose-high_div_rand_weighted-25grp is a 1.5 billion parameter instruction-tuned language model, building upon the base architecture of Qwen/Qwen2.5-1.5B-Instruct. This model was developed by cjiao and fine-tuned using the TRL framework.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It was fine-tuned using GRPO (Gradient-based Reward Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This suggests an optimization towards enhanced reasoning capabilities, potentially making it more robust for tasks requiring logical thought processes.

Capabilities and Use Cases

This model is suitable for a variety of text generation tasks, leveraging its instruction-tuned nature and the benefits of GRPO training. Its 1.5 billion parameters and 32768-token context length make it a capable option for applications where a smaller, efficient model with improved reasoning is desired. Developers can use it for tasks such as answering complex questions, generating creative text, or engaging in conversational AI, especially where the underlying reasoning quality is important.

Technical Details

The model was trained with specific versions of key frameworks, including TRL 0.19.1, Transformers 4.57.6, Pytorch 2.5.1, Datasets 4.8.4, and Tokenizers 0.22.2. Further details on the training process can be explored via the associated Weights & Biases run.