Elliott/LUFFY-Qwen-Math-7B-Zero
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 19, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

Elliott/LUFFY-Qwen-Math-7B-Zero is a 7.6 billion parameter model based on the Qwen architecture, developed by Elliott. It utilizes a reinforcement learning framework that integrates off-policy reasoning traces and policy shaping to enhance learning. This model is specifically optimized for complex mathematical reasoning and generalization, achieving state-of-the-art results among zero-RL methods on competitive math benchmarks and out-of-distribution tasks.

Loading preview...