hector-gr/RLCR-v4-ks-uniqueness-noece-noaurc-hotpot
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 28, 2026Architecture:Transformer Cold

The hector-gr/RLCR-v4-ks-uniqueness-noece-noaurc-hotpot model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B, utilizing the TRL framework. It was trained with the GRPO method, which is designed to enhance mathematical reasoning capabilities, as introduced in the DeepSeekMath paper. This model is optimized for tasks requiring advanced reasoning and problem-solving, particularly in areas where mathematical understanding is beneficial, and supports a 32768 token context length.

Loading preview...