hector-gr/RLCR-v4-ks-batch-frontier-combo-hotpot
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 28, 2026Architecture:Transformer Cold

hector-gr/RLCR-v4-ks-batch-frontier-combo-hotpot is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, to enhance mathematical reasoning capabilities. It is optimized for complex reasoning tasks, particularly those requiring structured problem-solving. With a 32768 token context length, it is suitable for applications demanding deep contextual understanding and logical inference.

Loading preview...