cjiao/goldengoose-gumbel_combined_random_seed3-25grp

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 31, 2026Architecture:Transformer Cold

The cjiao/goldengoose-gumbel_combined_random_seed3-25grp is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on enhancing mathematical reasoning. It is designed for general text generation tasks, leveraging its fine-tuning to provide improved conversational responses.

Loading preview...

Model Overview

The cjiao/goldengoose-gumbel_combined_random_seed3-25grp is a 1.5 billion parameter instruction-tuned language model, building upon the Qwen/Qwen2.5-1.5B-Instruct base model. It features a context length of 32768 tokens, making it suitable for processing longer inputs.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It was fine-tuned using GRPO (Gumbel-Softmax Policy Optimization), a method highlighted in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This training approach aims to enhance the model's reasoning capabilities, particularly in areas that benefit from structured thought processes.

Capabilities

  • Instruction Following: Fine-tuned to respond effectively to user instructions and questions.
  • Text Generation: Capable of generating coherent and contextually relevant text based on prompts.
  • Reasoning Focus: Benefits from GRPO training, which can contribute to improved logical and mathematical reasoning in its outputs.

Use Cases

This model is well-suited for applications requiring:

  • General-purpose conversational AI.
  • Question answering systems.
  • Text summarization and generation tasks where improved reasoning might be beneficial.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework, with specific versions including TRL 0.19.1 and Transformers 4.57.6. Further details on the training run are available via Weights & Biases.