rzzhan/ExGRPO-Qwen2.5-Math-7B-Zero
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer Open Weights Cold

The rzzhan/ExGRPO-Qwen2.5-Math-7B-Zero is a 7.6 billion parameter language model based on the Qwen2.5-Math-7B architecture, developed by rzzhan as part of the ExGRPO framework. This model is specifically designed for mathematical reasoning tasks, leveraging a novel experience management mechanism to strategically replay high-value experiences. It aims to improve the efficiency and stability of Reinforcement Learning from Human Feedback (RLHF) training for complex reasoning problems, particularly in mathematics.

Loading preview...