cjiao/goldengoose-corr-v2-0.50-100
The cjiao/goldengoose-corr-v2-0.50-100 model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring improved reasoning, particularly in mathematical contexts, and supports a 32K context length.
Loading preview...
Model Overview
cjiao/goldengoose-corr-v2-0.50-100 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages a 32,768 token context length, making it suitable for processing longer inputs.
Key Capabilities
- Enhanced Reasoning: This model was specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper. This training approach aims to significantly improve the model's mathematical reasoning abilities.
- Instruction Following: As a fine-tuned instruction model, it is designed to follow user prompts effectively, building upon the capabilities of its Qwen2.5-1.5B-Instruct base.
- Efficient Fine-tuning: The model's training utilized the TRL (Transformer Reinforcement Learning) library, indicating a focus on efficient and effective fine-tuning techniques.
When to Use This Model
This model is particularly well-suited for applications where robust mathematical reasoning and accurate instruction following are critical, especially within the constraints of a 1.5 billion parameter model. Its GRPO training makes it a strong candidate for tasks involving numerical problems, logical deductions, and other reasoning-intensive scenarios.