hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-batchcov-hotpot
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 5, 2026Architecture:Transformer Cold
hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-batchcov-hotpot is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B by hector-gr. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. With a context length of 32768 tokens, this model is optimized for tasks requiring advanced reasoning, particularly in mathematical contexts.
Loading preview...