cjiao/goldengoose-corr-v2-0.25-100
The cjiao/goldengoose-corr-v2-0.25-100 model is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, this model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced reasoning, building upon the robust foundation of the Qwen2.5 architecture.
Loading preview...
Model Overview
The cjiao/goldengoose-corr-v2-0.25-100 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. This model was developed by cjiao and leverages the TRL library for its training process.
Key Capabilities
- Enhanced Reasoning: The model's training incorporated the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper. This technique is specifically designed to push the limits of mathematical and general reasoning in open language models.
- Instruction Following: Building on the
Qwen2.5-1.5B-Instructfoundation, it retains strong instruction-following capabilities, making it suitable for various prompt-based tasks. - Efficient Performance: With 1.5 billion parameters and a context length of 32768 tokens, it offers a balance between performance and computational efficiency.
Training Details
The model was fine-tuned using the TRL library, with specific framework versions including TRL 0.19.1, Transformers 4.57.6, Pytorch 2.5.1, Datasets 4.8.4, and Tokenizers 0.22.2. The application of the GRPO method suggests a focus on improving logical and mathematical problem-solving abilities.
Good For
- Applications requiring mathematical reasoning and complex problem-solving.
- Tasks where instruction following and coherent text generation are crucial.
- Scenarios needing a relatively compact yet capable language model for reasoning-intensive workloads.