hector-gr/RLCR-v4-ks-uniqueness-hotpot
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 15, 2026Architecture:Transformer Cold

hector-gr/RLCR-v4-ks-uniqueness-hotpot is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning. This model is specifically optimized for tasks requiring robust reasoning capabilities, particularly in areas where mathematical understanding is beneficial.

Loading preview...