cjiao/goldengoose-gumbel-2.00-100
The cjiao/goldengoose-gumbel-2.00-100 is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, this model was trained using the TRL library and incorporates the GRPO method. Its primary differentiation lies in its training with GRPO, a technique designed to enhance mathematical reasoning capabilities. This model is suitable for tasks requiring improved reasoning, particularly in mathematical contexts.
Loading preview...
Model Overview
The cjiao/goldengoose-gumbel-2.00-100 is a 1.5 billion parameter instruction-tuned language model, building upon the foundation of Qwen/Qwen2.5-1.5B-Instruct. This model was developed by cjiao and fine-tuned using the TRL library.
Key Training Details
A significant aspect of this model's development is its training with GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", aims to enhance the model's mathematical reasoning abilities. The training process was tracked and can be visualized via Weights & Biases.
Intended Use Cases
Given its fine-tuning with the GRPO method, this model is particularly well-suited for:
- Mathematical reasoning tasks: Leveraging the GRPO training for improved performance in complex calculations and logical deductions.
- Instruction-following: As an instruction-tuned model, it can process and respond to user prompts effectively.
This model offers a compact yet capable option for applications requiring enhanced reasoning, especially within mathematical domains, building on the robust Qwen2.5 architecture.