cjiao/goldengoose-high_div_rand_polar-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 16, 2026Architecture:Transformer Warm

cjiao/goldengoose-high_div_rand_polar-25grp is a fine-tuned instruction-following language model based on the Qwen2.5-1.5B-Instruct architecture. Developed by cjiao, this model was trained using the TRL library and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for generating diverse and randomized responses, particularly in polar contexts, making it suitable for applications requiring varied and nuanced text generation.

Loading preview...

Overview

cjiao/goldengoose-high_div_rand_polar-25grp is an instruction-tuned language model built upon the Qwen2.5-1.5B-Instruct base model. It was developed by cjiao and fine-tuned using the TRL (Transformer Reinforcement Learning) library.

Key Training Details

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Training Method: Utilizes GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This suggests an emphasis on improving reasoning capabilities, potentially in mathematical or logical domains.
  • Frameworks: Trained with TRL 0.19.1, Transformers 4.57.6, Pytorch 2.5.1, Datasets 4.8.4, and Tokenizers 0.22.2.

Capabilities

  • Instruction Following: Inherits instruction-following capabilities from its Qwen2.5-1.5B-Instruct base.
  • Enhanced Reasoning: The application of the GRPO method indicates a focus on improving reasoning, particularly as it stems from research in mathematical reasoning.
  • Diverse Response Generation: The model's name, "high_div_rand_polar," suggests an optimization for generating highly diverse and randomized outputs, potentially with a focus on contrasting or polarized perspectives.

Good For

  • Applications requiring varied and non-deterministic text generation.
  • Tasks where nuanced or contrasting viewpoints are beneficial.
  • Exploratory text generation and creative writing where diverse outputs are desired.
  • Use cases that could benefit from improved reasoning, especially if related to mathematical or logical problem-solving, given the GRPO method's origin.