cjiao/goldengoose-low_div_rand_polar-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 16, 2026Architecture:Transformer Warm

The cjiao/goldengoose-low_div_rand_polar-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring robust reasoning, particularly in mathematical contexts, leveraging its 32768-token context length.

Loading preview...

Model Overview

cjiao/goldengoose-low_div_rand_polar-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages a substantial 32768-token context length, making it suitable for processing longer inputs and maintaining context over extended interactions.

Key Differentiator: GRPO Training

This model's primary distinction lies in its training methodology. It was fine-tuned using GRPO (Gradient-based Reward Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This specialized training approach aims to significantly enhance the model's capabilities in mathematical reasoning and problem-solving.

Use Cases

  • Mathematical Reasoning: Ideal for applications requiring the model to understand, process, and generate responses related to mathematical problems or logical deductions.
  • Instruction Following: As an instruction-tuned model, it is designed to follow user prompts effectively across various tasks.
  • Long Context Processing: Its 32768-token context window allows for handling complex queries or documents that require extensive contextual understanding.

Technical Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework, with specific versions of libraries including TRL 0.19.1 and Transformers 4.57.6. The GRPO method, as detailed in the DeepSeekMath paper, is central to its specialized performance.