cjiao/goldengoose-gumbel_combined_indoc_tau2.00-25grp
cjiao/goldengoose-gumbel_combined_indoc_tau2.00-25grp is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model was trained using GRPO, a method designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring robust reasoning, particularly in mathematical contexts, and features a 32768-token context length.
Loading preview...
Model Overview
cjiao/goldengoose-gumbel_combined_indoc_tau2.00-25grp is a 1.5 billion parameter instruction-tuned model, building upon the Qwen/Qwen2.5-1.5B-Instruct architecture. It has been fine-tuned using the TRL library.
Key Training Methodology
A distinguishing feature of this model is its training with GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is specifically designed to improve a model's mathematical reasoning abilities. The training process was tracked and can be visualized via Weights & Biases.
Capabilities and Use Cases
Given its fine-tuning with GRPO, this model is particularly well-suited for:
- Mathematical Reasoning Tasks: Excelling in problems that require logical and mathematical deduction.
- Instruction Following: As an instruction-tuned model, it can effectively follow user prompts for various tasks.
- General Language Generation: Capable of generating coherent and contextually relevant text, leveraging its base Qwen2.5 architecture.
This model offers a compact yet capable solution for applications where enhanced reasoning, especially in quantitative domains, is beneficial.