cjiao/goldengoose-high_div_rand_top-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 21, 2026Architecture:Transformer Warm

The cjiao/goldengoose-high_div_rand_top-25grp model is a 1.5 billion parameter language model fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, this model utilizes the GRPO training method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced mathematical problem-solving and logical deduction, building upon the robust foundation of the Qwen2.5 architecture. This model is suitable for applications where precise and accurate mathematical reasoning is critical.

Loading preview...

Model Overview

The cjiao/goldengoose-high_div_rand_top-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It was developed by cjiao and trained using the TRL library.

Key Training Innovation

A significant aspect of this model is its training procedure, which incorporates GRPO (Gradient Regularized Policy Optimization). This method was introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". The application of GRPO suggests a focus on enhancing the model's capabilities in complex reasoning tasks, particularly in the domain of mathematics.

Technical Details

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Parameter Count: 1.5 Billion
  • Context Length: 32768 tokens
  • Training Frameworks: TRL (0.19.1), Transformers (4.57.6), PyTorch (2.5.1), Datasets (4.8.4), Tokenizers (0.22.2)

Potential Use Cases

Given its fine-tuning with GRPO, this model is likely well-suited for:

  • Mathematical problem-solving: Tasks requiring logical deduction and numerical accuracy.
  • Reasoning-intensive applications: Scenarios where robust analytical capabilities are needed.
  • Instruction-following: Leveraging its base as an instruction-tuned model for specific tasks.