cjiao/goldengoose-divsweep_goose_n128_indorc_tau0.50-25grp
The cjiao/goldengoose-divsweep_goose_n128_indorc_tau0.50-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities in large language models. This model is particularly suited for tasks requiring robust logical and mathematical problem-solving, building upon the strong foundation of the Qwen2.5 architecture. Its 32K context length supports processing longer and more complex prompts for detailed analysis.
Loading preview...
Overview
This model, goldengoose-divsweep_goose_n128_indorc_tau0.50-25grp, is a 1.5 billion parameter instruction-tuned variant of the Qwen2.5-1.5B-Instruct base model. It has been specifically fine-tuned using the TRL framework, incorporating the GRPO (Guided Reasoning Policy Optimization) method. GRPO is a training technique introduced in the context of DeepSeekMath, aiming to significantly improve a model's mathematical reasoning abilities.
Key Capabilities
- Enhanced Mathematical Reasoning: Leverages the GRPO training method to boost performance on mathematical and logical tasks.
- Instruction Following: Built upon an instruction-tuned base model, it is designed to follow user prompts effectively.
- Context Handling: Supports a substantial context length of 32,768 tokens, allowing for processing of complex and lengthy inputs.
Training Details
The model's training procedure utilized GRPO, a method detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a focus on developing strong analytical and problem-solving skills. The fine-tuning was performed using the TRL library (Transformer Reinforcement Learning).
Good For
- Applications requiring strong mathematical problem-solving.
- Tasks that benefit from enhanced logical reasoning.
- Scenarios where detailed instruction following and longer context understanding are crucial.