hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy50-hotpot
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 25, 2026Architecture:Transformer Cold

The hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy50-hotpot model is a 7.6 billion parameter language model, fine-tuned by hector-gr from the Qwen/Qwen2.5-7B base architecture. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is particularly suited for tasks requiring advanced reasoning, building upon its Qwen2.5 foundation with specialized training for complex problem-solving.

Loading preview...