cjiao/goldengoose-gumbel_gradsim_tau2.00-25grp

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 24, 2026Architecture:Transformer Warm

The cjiao/goldengoose-gumbel_gradsim_tau2.00-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, it utilizes the GRPO training method, as introduced in the DeepSeekMath paper, to enhance mathematical reasoning capabilities. This model is designed for tasks requiring improved logical and mathematical processing, offering a context length of 32768 tokens.

Loading preview...

Model Overview

This model, cjiao/goldengoose-gumbel_gradsim_tau2.00-25grp, is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-1.5B-Instruct architecture. It has been specifically fine-tuned using the TRL library.

Key Training Details

A notable aspect of this model's development is its training with GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization). This method was originally introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The application of GRPO suggests an optimization focus on improving reasoning and problem-solving abilities, particularly in mathematical contexts.

Technical Specifications

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Parameter Count: 1.5 billion
  • Context Length: 32768 tokens
  • Training Frameworks: TRL (version 0.19.1), Transformers (version 4.57.6), Pytorch (version 2.5.1), Datasets (version 4.8.4), Tokenizers (version 0.22.2)

Potential Use Cases

Given its fine-tuning approach, this model is likely well-suited for applications requiring:

  • Enhanced logical reasoning.
  • Mathematical problem-solving.
  • Instruction-following tasks where precise and structured outputs are beneficial.