cjiao/goldengoose-high_div_rand-25grp
The cjiao/goldengoose-high_div_rand-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is particularly suited for tasks requiring robust reasoning, especially in mathematical contexts, leveraging its 32768 token context length.
Loading preview...
Model Overview
cjiao/goldengoose-high_div_rand-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages a substantial 32768 token context length, making it suitable for processing longer inputs and maintaining conversational coherence over extended interactions.
Key Capabilities
- Enhanced Mathematical Reasoning: This model was trained using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the "DeepSeekMath" paper. This training approach specifically targets and improves the model's ability to handle complex mathematical reasoning tasks.
- Instruction Following: As a fine-tuned instruction model, it is designed to accurately interpret and execute user prompts and instructions.
- Qwen2.5 Architecture: Built upon the Qwen2.5 series, it inherits the foundational strengths of this architecture, known for its general language understanding and generation capabilities.
Training Details
The model was fine-tuned using the TRL framework, with specific versions of libraries including TRL 0.19.1 and Transformers 4.57.6. The integration of the GRPO method is a key differentiator, focusing its optimization on reasoning performance.
Good For
- Applications requiring strong mathematical problem-solving.
- Tasks where precise instruction following is critical.
- Scenarios benefiting from a model with a large context window for detailed analysis or extended dialogues.