hector-gr/RLCR-v4-ks-uniqueness-hotpot-aliases
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 27, 2026Architecture:Transformer Cold
hector-gr/RLCR-v4-ks-uniqueness-hotpot-aliases is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B by hector-gr. It was trained using the TRL framework and the GRPO method, which is designed to enhance mathematical reasoning. This model is primarily optimized for tasks requiring advanced reasoning capabilities, leveraging its fine-tuning approach to potentially improve performance in complex problem-solving scenarios.
Loading preview...