cjiao/golden-goose-qwen2.5-1.5b-instruct-random

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 17, 2026Architecture:Transformer Warm

The cjiao/golden-goose-qwen2.5-1.5b-instruct-random is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring robust reasoning, particularly in mathematical contexts, leveraging its 32768 token context length.

Loading preview...

Model Overview

The cjiao/golden-goose-qwen2.5-1.5b-instruct-random is a 1.5 billion parameter instruction-tuned language model, building upon the base of Qwen/Qwen2.5-1.5B-Instruct. This model was fine-tuned using the TRL framework, a library for Transformer Reinforcement Learning.

Key Differentiator: GRPO Training

A significant aspect of this model's training is the application of GRPO (Gradient-based Reward Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is specifically designed to enhance a model's mathematical reasoning abilities. By incorporating GRPO, golden-goose-qwen2.5-1.5b-instruct-random aims to improve performance on complex reasoning tasks.

Capabilities & Use Cases

  • Enhanced Mathematical Reasoning: Due to its GRPO-based training, this model is particularly suited for tasks that involve mathematical problem-solving and logical deduction.
  • Instruction Following: As an instruction-tuned model, it is designed to accurately follow user prompts and generate relevant responses.
  • Long Context Processing: With a context length of 32768 tokens, it can handle and process extensive inputs, beneficial for detailed queries or multi-turn conversations.

Training Details

The model's training procedure utilized specific versions of popular frameworks:

  • TRL: 1.1.0
  • Transformers: 4.57.6
  • Pytorch: 2.10.0

This model is a strong candidate for applications requiring a compact yet capable model with a focus on improved reasoning, especially in quantitative domains.