hector-gr/RLCR-v4-ks-uniqueness-buf5k-noece-noaurc-hotpot
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 28, 2026Architecture:Transformer Cold

The hector-gr/RLCR-v4-ks-uniqueness-buf5k-noece-noaurc-hotpot model is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring advanced reasoning, particularly those benefiting from the GRPO training approach.

Loading preview...