hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-highcov-accgated-hotpot
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 9, 2026Architecture:Transformer Cold
hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-highcov-accgated-hotpot is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B. Developed by hector-gr, this model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced reasoning, building upon the DeepSeekMath research.
Loading preview...