hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-cold-5x-math
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 6, 2026Architecture:Transformer Cold
The hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-cold-5x-math model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B. Developed by hector-gr, this model utilizes the GRPO method, as introduced in the DeepSeekMath paper, for its training procedure. It is specifically optimized for mathematical reasoning tasks, leveraging advanced reinforcement learning techniques. The model supports a 32768 token context length, making it suitable for complex problem-solving.
Loading preview...