cjiao/goldengoose-top25_gradsim_polar-25grp
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 17, 2026Architecture:Transformer Warm
The cjiao/goldengoose-top25_gradsim_polar-25grp is a 1.5 billion parameter instruction-tuned language model, fine-tuned by cjiao from the Qwen2.5-1.5B-Instruct base model. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is particularly suited for tasks requiring improved logical and mathematical problem-solving, building upon its Qwen2.5 foundation.
Loading preview...
Overview
The cjiao/goldengoose-top25_gradsim_polar-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. This model was developed by cjiao and utilizes the TRL framework for its training process.
Key Capabilities
- Enhanced Mathematical Reasoning: A primary differentiator of this model is its training with the GRPO (Gradient-based Reward Policy Optimization) method. This technique, introduced in the DeepSeekMath paper, aims to significantly improve the model's ability to handle mathematical reasoning tasks.
- Instruction Following: As a fine-tuned instruction model, it is designed to respond effectively to user prompts and follow given instructions.
Good For
- Applications requiring robust mathematical problem-solving.
- Tasks where logical reasoning is crucial.
- Developers looking for a compact yet capable model with specialized reasoning enhancements.