hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-cold-math
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 27, 2026Architecture:Transformer Cold
hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-cold-math is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B. This model was trained using the TRL framework and incorporates the GRPO method, as introduced in the DeepSeekMath paper. It is specifically optimized for mathematical reasoning and complex problem-solving, leveraging advanced reinforcement learning techniques.
Loading preview...