hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-highcov-batchaccgated-hotpot
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 10, 2026Architecture:Transformer Cold

hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-highcov-batchaccgated-hotpot is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced reasoning, particularly in mathematical contexts. The model leverages a 32768 token context length for processing complex inputs.

Loading preview...