hector-gr/RLCR-v4-ks-uniqueness-hotpot-aliases-acceptedanswersfix
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 6, 2026Architecture:Transformer Cold

The hector-gr/RLCR-v4-ks-uniqueness-hotpot-aliases-acceptedanswersfix is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B with a 32K context length. Developed by hector-gr, this model was trained using the TRL framework and the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring robust reasoning and complex problem-solving, leveraging techniques from DeepSeekMath.

Loading preview...