cjiao/goldengoose-top25_gradsim-25grp

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 16, 2026Architecture:Transformer Cold

The cjiao/goldengoose-top25_gradsim-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned by cjiao based on Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, as introduced in the DeepSeekMath paper, to enhance its reasoning capabilities. This model is particularly suited for tasks requiring advanced mathematical and logical reasoning, building upon its Qwen2.5 base.

Loading preview...

Model Overview

The cjiao/goldengoose-top25_gradsim-25grp is a 1.5 billion parameter instruction-tuned language model, developed by cjiao. It is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model, leveraging the TRL (Transformer Reinforcement Learning) framework for its training process.

Key Capabilities

  • Enhanced Reasoning: This model was specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, a technique highlighted in the DeepSeekMath paper. This training approach aims to significantly improve the model's mathematical and logical reasoning abilities.
  • Instruction Following: As an instruction-tuned model, it is designed to accurately follow user prompts and generate relevant responses.
  • Efficient Performance: With 1.5 billion parameters and a 32K context length, it offers a balance between performance and computational efficiency, making it suitable for various applications where larger models might be overkill.

Training Details

The model's training utilized the TRL library (version 0.19.1) and was conducted with PyTorch 2.5.1. The GRPO method, central to its fine-tuning, is designed to push the boundaries of mathematical reasoning in open language models.

When to Use This Model

This model is a strong candidate for applications requiring:

  • Mathematical Problem Solving: Its GRPO-based training makes it particularly adept at tasks involving mathematical reasoning.
  • Logical Deduction: The fine-tuning process aims to improve its ability to handle complex logical queries.
  • Instruction-based Generation: For scenarios where precise adherence to instructions is crucial, building on its Qwen2.5-Instruct foundation.