cjiao/goldengoose-gumbel_combined_indoc_tau0.50-25grp
The cjiao/goldengoose-gumbel_combined_indoc_tau0.50-25grp model is a 1.5 billion parameter language model fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, this model utilizes the GRPO training method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring robust logical and mathematical processing, building upon the base Qwen2.5 architecture with a 32768 token context length.
Loading preview...
Model Overview
The cjiao/goldengoose-gumbel_combined_indoc_tau0.50-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It was developed by cjiao using the TRL framework.
Key Capabilities
- Enhanced Mathematical Reasoning: This model incorporates the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) training method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This method is specifically designed to improve a model's ability to handle complex mathematical and logical reasoning tasks.
- Instruction Following: As a fine-tuned version of an instruction-tuned model, it is capable of following user instructions effectively.
- Context Length: Benefits from the Qwen2.5 base model's substantial 32768 token context window, allowing for processing longer inputs and maintaining coherence over extended interactions.
Training Details
The model was trained using the TRL (Transformer Reinforcement Learning) library, with specific framework versions including TRL 0.19.1, Transformers 4.57.6, Pytorch 2.5.1, Datasets 4.8.4, and Tokenizers 0.22.2.
Good for
- Applications requiring strong mathematical problem-solving.
- Tasks where logical reasoning is critical.
- Instruction-following scenarios that can leverage its enhanced reasoning capabilities.