weizhepei/rlcr_hotpot_test
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 26, 2026Architecture:Transformer Cold
The weizhepei/rlcr_hotpot_test is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on enhancing mathematical reasoning. It is specifically optimized for tasks requiring advanced reasoning capabilities, leveraging its specialized training procedure.
Loading preview...